Prompt-Can-Anything

A fully automated toolkit: You just give prompt ！you only click once! you can do anything by sota model with prompt and creativity

Motivation

Current: Making a fully automated AI tool for engineering and research to create Data engines may require the use of more CLIP models

Target: Plan to generate high-quality data annotation data and train our models.

So it's just a tool for prompt any thing(YOCO)

Auto-label tool ,current structure (YOCO)

In addition, we will introduce video, audio, and 3D annotation in the future.

Semi-automatic interaction UI tool (coming soon)

Feature

🔥Data Engine

Provide fully automated data annotation with one-click export (detection, segmentation, text, and nerf reconstruction results) and refine these through engineering optimization, ,through the correlation models of stable diffusion and gpt, we can create more data source power for downstream tasks.
Extended one-click annotation training for the use of three-party projects, such as Yolo, Lora modes. （coming soon）
Accelerated processing of videos and datasets（coming soon）

⭐ Research🚀 project🔥 Inspiration（In preparation）

  At research level, Zero-shot comparative learning is research trend, we hope to understand as much as possible the model design details of the project we are applying, so that we want to combine text, images, and audio to design a strong aligned backbone.
  At project level, Tensorrt acceleration of the basic model accelerates efficiency.

⭐[news list]

-【2023/5/4】   add  semantic segmentatio label, add args(--color-flag --save-mask )

-【2023/4/26】  YOCO,Automatic annotation TOOLS:Commit preliminary code ,For the input image or folder, you can obtain the results of detection, segmentation, and text annotation , optional chatgpt api.

Preliminary-Works

Segment Anything : Strong segmentation model. But it needs prompts (like boxes/points) to generate masks.
Grounding DINO : Strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text.
Stable-Diffusion : Amazing strong text-to-image diffusion model.
Tag2text : Efficient and controllable vision-language model which can simultaneously output superior image captioning and image tagging.
lama : Resolution-robust large mask Inpainting with Fourier Convolutions

🛠️ YOCO:Quick Start

First, Make sure you have a basic gpu deep learning environment.

(Linux is recommended, Windows may have problems compiling Grounded-DINO Deformable- transformer operator, see Grounding DINO )

gir clone https://github.com/positive666/Prompt-Can-Anything
cd Prompt-Can-Anything

Install environment:

pip install -e .

Install diffusers（Optional）

pip install --upgrade diffusers[torch]

more ,you can see "pip install < your missing packages>"

Run

downloads models weights

	name	backbone	Data	Checkpoint	model-config
1	Tag2Text-Swin	Swin-Base	COCO, VG, SBU, CC-3M, CC-12M	Download link
2	Segment-anything	vit		Download link\| Download link\| Download link
3	Lama			Download link
4	GroundingDINO-T	Swin-T	O365,GoldG,Cap4M	Github link \| HF link	link
5	GroundingDINO-B	Swin-B	COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO	Github link \| HF link	link

set config file and args in utils/conf.py ,add your download weights to " MODEL_xxxx_PATH“ ,if need chatgpt,configure the "PROXIES", "API_KEY "
run demo

"--tag2text" :  provide images tage , you can use chatgpt to merge or filter words
"--input_prompt" :  Select the detection target noun you are interested in, and you can turn off Tag2text
'--color-flag': Give your semantic segmentation MASK the same category the same color

python demo.py  --source <data path>  --save-txt  --save-mask --save-xml  --save_caption

**🏃Demo **

🔨To Do list

💘 Acknowledgements

Segment Anything
Grounding DINO
Tag2text
lama

Thanks for their great work!

hukaick / prompt-can-anything Goto Github PK