ofa-sys / chinese-clip Goto Github PK
View Code? Open in Web Editor NEWChinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
License: MIT License
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
License: MIT License
感谢您的分享!我想请问一下,训练的(~2亿图文对)是从网站上随机爬取的嘛?有数据筛选环节嘛?是否包含了一些公开的caption数据集?期待您的回复
在这个体验页面,搜索“戴眼镜的猫”与“没戴眼镜的猫”结果出来的都是戴眼镜的猫。这个问题可以解决吗?
tritonserver image version: nvcr.io/nvidia/tritonserver:22.05-py3
model: ViT-H-14
bash error: creating server: Invalid argument - load failed for model 'clip-image-onnx': version 1 is at UNAVAILABLE state: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string<char>&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.
尝试将Chinese CLIP作为stable diffusion的text encoder,但是一直生成纯黑图像(安全检查已经关闭),我想问下是否可以作为sd的text encoder呢?官方是否做过测试。
作者您好,我使用如下命令运行zeroshot_eval.sh出现报错,是不是我的命令书写哪里存在问题呢,谢谢。 ${DATAPATH}文件夹命名为data。
错误如下:
$ bash run_scripts/zeroshot_eval.sh 0 \
> data fgvc-aircraft-2013b-variants102 \
> ViT-B-16 RoBERTa-wwm-ext-base-chinese \
> data/ckpt
Traceback (most recent call last):
File "E:/Project/CLIP/Chinese-CLIP-master/cn_clip/eval/zeroshot_evaluation.py", line 18, in <module>
from cn_clip.eval.data import get_zeroshot_dataset, _preprocess_text
ImportError: cannot import name 'get_zeroshot_dataset' from 'cn_clip.eval.data' (E:\Anaconda\envs\PyTorch\lib\site-packages\cn_clip\eval\data.py)
脚本如下:
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=`pwd`/cn_clip
path=/data/datasets
dataset=fgvc-aircraft-2013b-variants102-example
datapath=${path}/datasets/${dataset}/test:data/datasets/fgvc-aircraft-2013b-variants102-example/test
savedir=${path}/save_predictions:data/pretrained_weights
vision_model=ViT-B-16
text_model=RoBERTa-wwm-ext-base-chinese
resume=data/pretrained_weights/clip_cn_vit-b-16.pt
label_file=${path}/${dataset}/label_cn.txt
index=${7:-}
python -u E:/Project/CLIP/Chinese-CLIP-master/cn_clip/eval/zeroshot_evaluation.py \
--datapath="${datapath}" \
--label-file=${label_file} \
--save-dir=${savedir} \
--dataset=${dataset} \
--index=${index} \
--img-batch-size=64 \
--resume=${resume} \
--vision-model=${vision_model} \
--text-model=${text_model}
我想训练自己的模型 然后基于自己的模型执行检索
参考 预训练CKPT
数据集格式预处理
模型finetune
在output中可以拿到新的.pt模型文件
我要针对我的模型文件执行以图/文搜图如何实现
KNN检索
示例中
${split}_imgs.img_feat.jsonl
文件如何生成的不太清楚
输入也没有找到图/文
torch==1.9.0
torchvision==0.10.0
lmdb==1.3.0
cuda version 10.2
上面是我的环境配置,我跑默认的clip_cn_vit-b-16.pt是可以finetuing,但是换成clip_cn_rn50.pt就失败了。下面是启动脚本中修改的地方。
checkpoint=clip_cn_rn50.pt
vision_model=RN50
text_model=RBT3-chinese
Hi an~ @yangapku
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward
function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint
functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Do you have any suggestions to solve this problem?
请问CN-CLIP是如何使用Flickr数据中一个图片对应的多个文本的,如何避免同一图片的多个文本出现在同一个batch中,成为错误的负样本?谢谢。
您好,我将zero数据处理后约7000w,存储到lmdb文件,但是lmdb仅能生成一个data.mdb,存放在一个device上,导致我训练模型时数据读取io瓶颈,CPU与GPU利用不充分,请问有遇到这个问题吗?
2023-02-23,06:08:33 | INFO | Rank 0 | Validation Result (epoch 3 @ 99 steps) | Valid Loss: 0.000000 | Image2Text Acc: 100.00 | Text2Image Acc: 100.00 | logit_scale: 4.605 | Valid Batch Size: 1
2023-02-23,06:08:40 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch3.pt (epoch 3 @ 99 steps) (writing took 7.470757007598877 seconds)
2023-02-23,06:08:48 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch_latest.pt (epoch 3 @ 99 steps) (writing took 7.439142227172852 seconds)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/logging/handlers.py", line 1482, in _monitor
record = self.dequeue(True)
File "/usr/lib/python3.8/logging/handlers.py", line 1431, in dequeue
return self.queue.get(block)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get
res = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
感谢分享这个工作。
想请教下训练集里会存在一个图对应多个文本以及一个文本对应多个图的情况吗?目前我的finetune数据训练集里有比较多的这种多对多的数据,请教下最恰当的处理方式是?像一个图对多个文本的话,一般是将这多个文本拼接成一个长文本,还是拆成多个图文对样本来处理会比较好?谢谢!
跨模态模型几乎都会关注img2text或者text2img的效果,体现了模态对齐的能力强弱。但在做跨模态对齐的预训练后,请问大佬其在单模态的检索能力相比其他在imageNet上预训练的特征提取模型比如ResNet系列的如何呢?我自己简单尝试了一下,把跨模态预训练模型如ViT-B-16的图像塔拿出来做特征提取器,构建一个小型图片向量检索数据库,和vgg16比了一下,效果只是和vgg16差不多...
就因为方便比较速度吗?建议提供动态的batch_size
在COCO-CN和Flickr-CN上,chinese-clip RN50小模型相比wukong/R2D2的vit-b效果差不多或者落后,但是在MUGE上却大幅领先,请问这是什么原因?
KNN检索
示例中
输入应该包含 txt or image。但是没有看到
是什么情况呢
非常棒的代码,感觉是目前最贴心最易用的 VLP 代码框架,感谢各位作者耐心的开源工作!不知道作者有没有计划添加在生成任务(比如 image captioning)上进行微调的代码,目前中文 VLP 似乎都没有像 BLIP 那样基于 captioning 预训练的,不知道 CN-CLIP 在生成任务上的表现如何,如果有微调代码就再好不过了❤️
我在下载预训练模型之后,直接调用eval下的extract_features.py时加载模型显示我缺失bert.pooler相关参数(Missing key(s) in state_dict: "bert.pooler.dense.weight", "bert.pooler.dense.bias"),我看到代码里对这一部分进行了强调(sd = {k[len('module.'):]: v for k, v in sd.items() if "bert.pooler" not in k}),但我没有发现解决这一问题的办法,请问对于这一部分为什么会代码中进行强调,以及该如何解决这一问题呢???
你好,数据集除了包含wukong1亿图文对,还有哪些组成部分呢,另外数据集考虑开源么?
n你好
运行训练指令 报错如题,请问是哪里的配置有问题
你好,请教一下,训练的时候,出现如下问题:
cd Chinese-CLIP/ bash run_scripts/muge_finetune_vit-b-16_rbt-base.sh ${DATAPATH}
出现下面的问题:
root@clip-test-d9cd48656-q2zbl:~/workspace/clip/Chinese-CLIP# bash run_scripts/muge_finetune_vit-b-16_rbt-base.sh ../clip_set/ /usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects
--local_rankargument to be set, please change it to read from
os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Root Cause (first observed failure):
[0]:
time : 2023-02-21_09:58:00
host : clip-test-d9cd48656-q2zbl
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 723)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`
能看出是什么原因吗?
感谢开源。
请问,提供的预训练模型是否针对图召回文进行预训练?
现在尝试了下在我们数据上进行图召回文 fine-tuning, recall 比较低。
Recall@100 0.23963270260850056
谢谢。
作者你好,请教一下128卡 v100是如何做到训练ViT- L/14时batchsize可以设置为32k的?是使用了fp16么?我实验时用混合精度,batchsize最大为10k,再大就out of menmory了。
如题,请问能提供8卡训练COCO-CN的finetune脚本吗?效果不会差太多的那种:)
hello,感谢开源cnclip,我用自己的数据 finetune了 cnclip 并想把vision模块转成onnx,转完之后 用onnxruntime运行包如下错,请问是什么情况呀
2022-12-06 20:54:43.837539176 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running Reshape node. Name:'Reshape_54' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{197,1,768}, requested shape:{197,2364,64}
Traceback (most recent call last):
File "test.py", line 18, in
(text_feature) = sessison.run(output_names, {'image': image})
File "/home/jovyan/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_54' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{197,1,768}, requested shape:{197,2364,64}
prompt ensembling已经开源了么
用的不是同一套权重么?还是压根就不是同一个模型?表现也差太多了吧
作者您好,很高兴能用到您开源的中文clip,但我在实验过程中发现,所有文本-图片对的相似度计算结果全为1,运行了您提供的pokeman的demo的输出是正常,这是否是模型训练数据的问题,还是我用的不对,请指教。
这里是我的数据,共32张图片,将数据解压到代码所在目录即可,下面是我的脚本运行大约需要两分钟
# coding: utf8
from glob import glob
import os
import pickle
from prettytable import PrettyTable
import torch
from PIL import Image
import cn_clip.clip as clip
from cn_clip.clip import load_from_name
from tqdm import tqdm
device = 'cuda:0'
model_path = './'
data_path = './samples/*'
total = len(list(glob(os.path.join(data_path, '*')))) * len(os.listdir(model_path))
res = {}
pbar = tqdm(total=total)
for model_name in ['ViT-B-16', 'ViT-L-14', 'ViT-L-14-336', 'ViT-H-14', 'RN50']:
model, preprocess = load_from_name(model_name, device=device, download_root=model_path)
model.eval()
res[model_name] = {}
for dp in glob(data_path):
text = dp.split('/')[-1]
text_encode = clip.tokenize([text]).to(device)
res[model_name][text] = {}
path = os.path.join(dp, '*')
for pic_path in glob(path):
pic_name = pic_path.split('/')[-1].split('.')[0]
image = preprocess(Image.open(pic_path)).unsqueeze(0).to(device)
with torch.no_grad():
logits_per_image, logits_per_text = model.get_similarity(image, text_encode)
probs = logits_per_image.softmax(dim=-1).cpu().item()
res[model_name][text][pic_name] = probs
pbar.update(1)
for k in res:
table = PrettyTable()
table.title = '{} results'.format(k)
table.field_names = ['model_name', 'text', 'true0', 'neg0', 'neg1', 'neg2']
for kk in res[k]:
row = [k, kk, res[k][kk]['true0'], res[k][kk]['neg0'], res[k][kk]['neg1'], res[k][kk]['neg2']]
table.add_row(row)
print(table)
下面是我跑出来的结果:
+------------+--------------------------------+-------+------+------+------+
| model_name | text | true0 | neg0 | neg1 | neg2 |
+------------+--------------------------------+-------+------+------+------+
| ViT-B-16 | 踩三轮的印度人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 吃竹子的熊猫 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 大家在餐桌上交谈 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 非洲大草原上一头斑马正看着镜头 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 好多玫瑰花呀 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 沙地里的巨型仙人掌 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 舞狮的人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-B-16 | 一艘油轮行驶在海洋中 | 1.0 | 1.0 | 1.0 | 1.0 |
+------------+--------------------------------+-------+------+------+------+
+------------+--------------------------------+-------+------+------+------+
| model_name | text | true0 | neg0 | neg1 | neg2 |
+------------+--------------------------------+-------+------+------+------+
| ViT-L-14 | 踩三轮的印度人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 吃竹子的熊猫 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 大家在餐桌上交谈 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 非洲大草原上一头斑马正看着镜头 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 好多玫瑰花呀 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 沙地里的巨型仙人掌 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 舞狮的人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14 | 一艘油轮行驶在海洋中 | 1.0 | 1.0 | 1.0 | 1.0 |
+------------+--------------------------------+-------+------+------+------+
+--------------+--------------------------------+-------+------+------+------+
| model_name | text | true0 | neg0 | neg1 | neg2 |
+--------------+--------------------------------+-------+------+------+------+
| ViT-L-14-336 | 踩三轮的印度人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 吃竹子的熊猫 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 大家在餐桌上交谈 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 非洲大草原上一头斑马正看着镜头 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 好多玫瑰花呀 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 沙地里的巨型仙人掌 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 舞狮的人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-L-14-336 | 一艘油轮行驶在海洋中 | 1.0 | 1.0 | 1.0 | 1.0 |
+--------------+--------------------------------+-------+------+------+------+
+------------+--------------------------------+-------+------+------+------+
| model_name | text | true0 | neg0 | neg1 | neg2 |
+------------+--------------------------------+-------+------+------+------+
| ViT-H-14 | 踩三轮的印度人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 吃竹子的熊猫 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 大家在餐桌上交谈 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 非洲大草原上一头斑马正看着镜头 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 好多玫瑰花呀 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 沙地里的巨型仙人掌 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 舞狮的人 | 1.0 | 1.0 | 1.0 | 1.0 |
| ViT-H-14 | 一艘油轮行驶在海洋中 | 1.0 | 1.0 | 1.0 | 1.0 |
+------------+--------------------------------+-------+------+------+------+
+------------+--------------------------------+-------+------+------+------+
| model_name | text | true0 | neg0 | neg1 | neg2 |
+------------+--------------------------------+-------+------+------+------+
| RN50 | 踩三轮的印度人 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 吃竹子的熊猫 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 大家在餐桌上交谈 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 非洲大草原上一头斑马正看着镜头 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 好多玫瑰花呀 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 沙地里的巨型仙人掌 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 舞狮的人 | 1.0 | 1.0 | 1.0 | 1.0 |
| RN50 | 一艘油轮行驶在海洋中 | 1.0 | 1.0 | 1.0 | 1.0 |
+------------+--------------------------------+-------+------+------+------+
可以看出,对于文本的真实对应图片true0和其他反例neg*,相似度都为1,给人的感觉就是模型无法区分每张图片。
import cn_clip.clip as clip
发生异常: UnicodeDecodeError
Traceback (most recent call last):
File "D:\develop\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "D:\develop\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "c:\Users\saizong.vscode\extensions\ms-python.python-2022.4.1\pythonFiles\lib\python\debugpy_main.py", line 45, in
cli.main()
File "c:\Users\saizong.vscode\extensions\ms-python.python-2022.4.1\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
run()
File "c:\Users\saizong.vscode\extensions\ms-python.python-2022.4.1\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("main"))
File "D:\develop\anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\develop\anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\develop\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "d:\develop\workspace\today_video\clipcn.py", line 5, in
import cn_clip.clip as clip
File "D:\develop\anaconda3\Lib\site-packages\cn_clip\clip_init.py", line 3, in
_tokenizer = FullTokenizer()
File "D:\develop\anaconda3\Lib\site-packages\cn_clip\clip\bert_tokenizer.py", line 170, in init
self.vocab = load_vocab(vocab_file)
File "D:\develop\anaconda3\Lib\site-packages\cn_clip\clip\bert_tokenizer.py", line 132, in load_vocab
token = convert_to_unicode(reader.readline())
UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 1564: illegal multibyte sequence
请问如何处理?谢谢!
感谢大佬开源,请问技术报告中Table 2: 针对MUGE数据集跨模态检索的结果展示中,文字说明on text-to-image retrieval AND image-to-text retrieval,没看错的话表中应该只展示了text-to-image retrieval。在另外两个数据集COCO-CN和 Flickr30K-CN都展示了图文和文图的检索效果,所以也想知道MUGE图文检索的效果。
你好,我用2台4卡A100训练1个epoch时间在10h,单机4卡A100一个epoch的训练时间30min。请问哪些地方导致多机多卡训练效率降低的?
请问我怎么应用这个模型啊,例如项目中给了一个例子:
import torch
from PIL import Image
import cn_clip.clip as clip
from cn_clip.clip import load_from_name, available_models
print("Available models:", available_models())
# Available models: ['ViT-B-16', 'ViT-L-14', 'ViT-L-14-336', 'ViT-H-14', 'RN50']
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = load_from_name("ViT-B-16", device=device, download_root='./')
model.eval()
image = preprocess(Image.open("examples/pokemon.jpeg")).unsqueeze(0).to(device)
text = clip.tokenize(["杰尼龟", "妙蛙种子", "小火龙", "皮卡丘"]).to(device)
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
# 对特征进行归一化,请使用归一化后的图文特征用于下游任务
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
logits_per_image, logits_per_text = model.get_similarity(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()
print("Label probs:", probs) # [[1.268734e-03 5.436878e-02 6.795761e-04 9.436829e-01]]
怎么让这儿加载的是我训练好的模型啊?多多指教,感谢。
您好,在train.py的35行到44行,为什么聚合成global_batch的时候要再做一次35行到44行的操作,不可以直接用gathered_image_features和gathered_text_features吗?
could you please provide the muge, flickr 30 cn and coco cn you used for eval ?
thanks
没有独立显卡的电脑(CPU有inter的核显)可以使用吗?如果可以的话,需要怎么设置?
Hello your team,
I followed the guide here: https://github.com/OFA-Sys/Chinese-CLIP/blob/master/deployment.md and success get the ONNX model that list below:
vit-b-16.txt.fp32.onnx 391 MB
vit-b-16.txt.fp16.onnx 2.27 MB
vit-b-16.img.fp32.onnx 332 MB
vit-b-16.img.fp16.onnx 3.34 MB
vit-b-16.txt.fp16.onnx.extra_file 194 MB
vit-b-16.img.fp16.onnx.extra_file 164 MB
But when I deployed the img model("vit-b-16.img.fp32.onnx") to Android, I just met the follow exception:
ai.onnxruntime.OrtException: Error code - ORT_INVALID_GRAPH - message: This is an invalid model. Error in Node:/visual/Unsqueeze : Node (/visual/Unsqueeze) has input size 2 not in range [min=1, max=1]. at ai.onnxruntime.OrtSession.createSession(Native Method) at ai.onnxruntime.OrtSession.<init>(OrtSession.java:82) at ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:206) at ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:179)
I just a newbee here, can you team give some suggestions to overcome this bug?
Thanks so much.
作者您好,在您的帮助下前一个问题zeroshot_eval.sh已成功运行,也成功得到了您所提到的准确率,非常感谢。
之后想尝试训练模型,但使用训练命令时却没有反应,可以麻烦您解答一下吗?
${DATAPATH}文件夹命名为data,datasets使用作者提供预处理好的MUGE。
输入命令如下:
bash run_scripts/muge_finetune_vit-b-16_rbt-base.sh data
输入后,没有反馈信息
脚本设置如下:
GPUS_PER_NODE=1
WORKER_CNT=1
export MASTER_ADDR=localhost
export MASTER_PORT=8514
export RANK=0
export PYTHONPATH=${PYTHONPATH}:`pwd`/cn_clip/
…
脚本后面部分没有改动
ViT-B-16.json文件可以在open-clip下找到,请问这个在哪里能找到?
您好,我通过以下代码导入模型后想要微调CLIP模型,我先冻结了CLIP的参数,添加了全连接层在我的任务上训练, 之后再解冻CLIP部分参数,但是CLIP模型输出为Nan,请问我应该如何微调CLIP模型呢?
model, preprocess = load_from_name("ViT-B-16", device=device, download_root='./')
就是用提取的图像和文本向量计算余弦距离 ;尝试了一下实际的数据集好像也存在这个问题
你好
我跑MUGE数据集,看训练过程中的验证集指标一直没有变化如下:
2022-12-08,21:22:28 | INFO | Rank 1 | Validation Result (epoch 2 @ 4650 steps) | Valid Loss: 1.668444 | Image2Text Acc: 32.41 | Text2Image Acc: 32.94 | logit_scale: 4.595 | Valid Batch Size: 48
2022-12-08,21:22:28 | INFO | Rank 0 | Validation Result (epoch 2 @ 4650 steps) | Valid Loss: 1.668444 | Image2Text Acc: 32.41 | Text2Image Acc: 32.94 | logit_scale: 4.595 | Valid Batch Size: 48
准确率一直是32左右。
请问怎么浮现出仓库中写的60+的准确率?
您好,非常感谢开源中文clip,对我们的学术研究有这很大的帮助。
我们实验室也曾经尝试过大数据及的pretrain,但pytorch,all_gather的速度会随节点数上升而显著变慢,导致训练时间严重边长,请问你们训练的时候是如何解决all_gather的速度问题呀,感谢
Thanks for sharing these models and results for Chinese. Multilingual clips are important!
I see you have evaluation code for imagenet with Chinese prompts. Do you have results of your Chinese clip on imagenet with Chinese prompts and classnames?
咨询个问题,获取的结果是imageid 那么怎么根据id找到图呢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.