yangjianxin1 / clip-chinese Goto Github PK

View Code? Open in Web Editor NEW

343.0 343.0 54.0 936 KB

中文CLIP预训练模型

Python 100.00%

chinese clip

clip-chinese's Introduction

Hi there 👋, I'm Yang Jianxin

I'm a NLPer interested in Large Language Model and graduated from SYSU with a master's degree.

In my free time, I like to write technical blogs on [Wechat Official Accounts: YeungNLP] and [Zhihu: 红雨瓢泼]

🔭 Experiences:

Shopee, responsible for building NLP algorithm ability about Customer Service. (from 2022-04 to now)
Tencent, responsible for building NLP algorithm ability about Product Understanding. (from 2021-06 to 2022-04)
Alibaba, Internship at Alibaba (from 2020-06 to 2020-09).

⚙ Here are some my public projects:

Project	Description	Code
Firefly	One-stop training for LLMs. Some achievements: 1. firefly-llama2-13b ranked 3rd among all 13B models on Open LLM Leaderboard, only 0.5 points less than 1st. 2. firefly-llama-30b ranked 10th among all 30B models on Open LLM Leaderboard trained with single V100. 3. firefly-baichuan-13b achieves over 1.63 million downloads. 4. firefly-qwen1.5-en-7b-dpo improves 7.21 points compared with the official chat model. 5. firefly-gemma-7b improves 9.37 points compared with the official chat model.
GPT2-chitchat	Chinese GPT2 for chitchat
Firefly-LLaMA2-Chinese	Chinese Llama2 with efficient and effective training method.
LongQLoRA	Efficient and Effective method for extending context length of Llama2 to 8192 with single V100. Technical Report
CPM	Chinese composition model based on CPM
CLIP-Chinese	Chinese CLIP model trained with 1.4 million image-text pairs
ClipCap-Chinese	Chinese image caption model based on clip and mengzi
OFA-Chinese	Chinese multi-modal unified pre-training model
LLMPruner	Prune vocabulary of LLMs to save memory in training.

📁 Here are some my technical blogs:

clip-chinese's People

Contributors

Stargazers

Watchers

clip-chinese's Issues

是否需要微调

您好！我想用您的模型来对本地数据集进行图文相似度计算。请问是否需要进行微调呢？谢谢！

训练时的错误，加载数据时，’text‘属性无法找到。拜托大佬帮忙指导一下

KeyError: 'text'

你好，我遇到了没有cuda的问题

你好，我遇到了没有cuda的问题
在windows后台运行：python train_clip.py --train_args_file train_args/train_clip.json

报错
Traceback (most recent call last):
File "train_clip.py", line 136, in
main()
File "train_clip.py", line 86, in main
args, training_args = parser.parse_json_file(json_file=train_args_file)
File "E:\ancanda\envs\CLIP-Chinese\lib\site-packages\transformers\hf_argparser.py", line 392, in parse_json_file
outputs = self.parse_dict(data, allow_extra_keys=allow_extra_keys)
File "E:\ancanda\envs\CLIP-Chinese\lib\site-packages\transformers\hf_argparser.py", line 367, in parse_dict
obj = dtype(**inputs)
File "", line 105, in init
File "E:\ancanda\envs\CLIP-Chinese\lib\site-packages\transformers\training_args.py", line 1133, in post_init
raise ValueError(
ValueError: FP16 Mixed precision training with AMP or APEX (--fp16) and FP16 half precision evaluation (--fp16_full_eval) can only be used on CUDA devices.

cuda版本是12.4
(CLIP-Chinese) F:\python\CLIP-Chinese>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

是版本不兼容吗？

KeyError: 'text'

When I follow your steps to train, I always get the following error

KeyError: Caught KeyError in DataLoader worker process 0.
KeyError: 'text'

TypeError: unsupported operand type(s) for *: 'dict' and 'int'

I am a beginner in natural language processing.

When I clone the repository, install the environment dependencies, and try to run the quickstart code in README, the following error occurs, how should I solve it?

my python version is 3.8.15, other dependencies have the same version as the requirements.txt

Traceback (most recent call last):
  File "quickstart.py", line 15, in <module>
    inputs = processor(text=["一只小狗在摇尾巴", "一只小猪在吃饭"], images=image, return_tensors="pt", padding=True)
  File "/home/liuzhiming/.miniconda3/envs/clip-chinese/lib/python3.8/site-packages/transformers/models/clip/processing_clip.py", line 85, in __call__
    image_features = self.feature_extractor(images, return_tensors=return_tensors, **kwargs)
  File "/home/liuzhiming/.miniconda3/envs/clip-chinese/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py", line 146, in __call__
    images = [self.resize(image=image, size=self.size, resample=self.resample) for image in images]
  File "/home/liuzhiming/.miniconda3/envs/clip-chinese/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py", line 146, in <listcomp>
    images = [self.resize(image=image, size=self.size, resample=self.resample) for image in images]
  File "/home/liuzhiming/.miniconda3/envs/clip-chinese/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py", line 207, in resize
    new_short, new_long = size, int(size * long / short)
TypeError: unsupported operand type(s) for *: 'dict' and 'int'

I tried to output the values of several parameters in transformers/models/clip/feature_extraction_clip.py", line 207, in resize and found that the type of size is dict, not int，The specific values are as follows：

size:  {'shortest_edge': 224}
long=960
short=600

Can I use CPU for training?

when install pytorch with cpu-only, and conduct training.

An error warning
FP16 Mixed precision training with AMP or APEX (--fp16) and FP16 half precision evaluation (--fp16_full_eval) can only be used on CUDA devices.

yangjianxin1 / clip-chinese Goto Github PK

clip-chinese's Introduction

Hi there 👋, I'm Yang Jianxin

clip-chinese's People

Contributors

Stargazers

Watchers

Forkers

clip-chinese's Issues

是否需要微调

训练时的错误，加载数据时，’text‘属性无法找到。拜托大佬帮忙指导一下

请问有原始的ckpt权重吗？

你好，我遇到了没有cuda的问题

KeyError: 'text'

TypeError: unsupported operand type(s) for *: 'dict' and 'int'

Can I use CPU for training?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent