Giter VIP home page Giter VIP logo

prompt-can-anything's Introduction

Prompt-Can-Anything

A fully automated toolkit: You just give prompt !you only click once! you can do anything by sota model with prompt and creativity

Motivation

Current: Making a fully automated AI tool for engineering and research to create Data engines may require the use of more CLIP models

Target: Plan to generate high-quality data annotation data and train our models.

So it's just a tool for prompt any thing(YOCO)

  1. Auto-label tool ,current structure (YOCO)

    In addition, we will introduce video, audio, and 3D annotation in the future.

structure

  1. Semi-automatic interaction UI tool (coming soon)

Feature

  • 🔥Data Engine

    Provide fully automated data annotation with one-click export (detection, segmentation, text, and nerf reconstruction results) and refine these through engineering optimization, ,through the correlation models of stable diffusion and gpt, we can create more data source power for downstream tasks.

  • Extended one-click annotation training for the use of three-party projects, such as Yolo, Lora modes. (coming soon)

  • Accelerated processing of videos and datasets(coming soon)

⭐ Research🚀 project🔥 Inspiration(In preparation)
  At research level, Zero-shot comparative learning is research trend, we hope to understand as much as possible the model design details of the project we are applying, so that we want to combine text, images, and audio to design a strong aligned backbone.
  At project level, Tensorrt acceleration of the basic model accelerates efficiency.

⭐[news list]

-【2023/5/4】   add  semantic segmentatio label, add args(--color-flag --save-mask )

-【2023/4/26】  YOCO,Automatic annotation TOOLS:Commit preliminary code ,For the input image or folder, you can obtain the results of detection, segmentation, and text annotation , optional chatgpt api.

Preliminary-Works

  • Segment Anything : Strong segmentation model. But it needs prompts (like boxes/points) to generate masks.

  • Grounding DINO : Strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text.

  • Stable-Diffusion : Amazing strong text-to-image diffusion model.

  • Tag2text : Efficient and controllable vision-language model which can simultaneously output superior image captioning and image tagging.

  • lama : Resolution-robust large mask Inpainting with Fourier Convolutions

🛠️ YOCO:Quick Start

First, Make sure you have a basic gpu deep learning environment.

(Linux is recommended, Windows may have problems compiling Grounded-DINO Deformable- transformer operator, see Grounding DINO )

gir clone https://github.com/positive666/Prompt-Can-Anything
cd Prompt-Can-Anything

Install environment:

pip install -e .

Install diffusers(Optional)

pip install --upgrade diffusers[torch]

more ,you can see "pip install < your missing packages>"

Run

  1. downloads models weights

    name backbone Data Checkpoint model-config
    1 Tag2Text-Swin Swin-Base COCO, VG, SBU, CC-3M, CC-12M Download link
    2 Segment-anything vit Download link| Download link| Download link
    3 Lama Download link
    4 GroundingDINO-T Swin-T O365,GoldG,Cap4M Github link | HF link link
    5 GroundingDINO-B Swin-B COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO Github link | HF link link
    1. set config file and args in utils/conf.py ,add your download weights to " MODEL_xxxx_PATH“ ,if need chatgpt,configure the "PROXIES", "API_KEY "
    2. run demo
    "--tag2text" :  provide images tage , you can use chatgpt to merge or filter words
    "--input_prompt" :  Select the detection target noun you are interested in, and you can turn off Tag2text
    '--color-flag': Give your semantic segmentation MASK the same category the same color
python demo.py  --source <data path>  --save-txt  --save-mask --save-xml  --save_caption 

**🏃Demo **

image-20230427093103453

image-20230508075845259

🔨To Do list

  • Release demo and code(2 days within).
  • web ui demo
  • support video ,chatgpt, add inpainting model demo
  • add 3d nerf demo
  • fintune sam and ground??
  • Release training datasets.

💘 Acknowledgements

prompt-can-anything's People

Contributors

positive666 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.