Giter VIP home page Giter VIP logo

Comments (9)

hanoonaR avatar hanoonaR commented on August 10, 2024

Hello @remvanthull,

Thank you so much for your interest in our work! We are planning to release the GranD dataset along with the pre-training code very soon. You'll be able to experiment with training the model from scratch as the release is scheduled for just a week from now!

Stay tuned, Thank you again for your support.

from groundinglmm.

remvanthull avatar remvanthull commented on August 10, 2024

Hi @hanoonaR,

Thank you so much for your response!

That is amazing! Very much looking forward to that 😄 Are you by any chance also planning on releasing the code for the automated pipeline for the GranD dataset as well? I'm very interested in that too! Thanks again!

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @remvanthull,

The upcoming release will indeed include the code for the automated pipeline used to construct the GranD dataset. Additionally, we'll provide detailed instructions on how to convert it into the GLAMM dataset format. Thank you.

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @remvanthull

We are pleased to announce the release of our dataset and the code for the automatic annotation pipeline. You can access the dataset here and find details on the annotation pipeline here.
Please let us know if you have any questions. Thank you.

from groundinglmm.

remvanthull avatar remvanthull commented on August 10, 2024

Hi @hanoonaR

Thank you so much for letting me know! I am very excited to delve into this right away. Just one more question; are you also releasing instructions on how to pre-train GLaMM on GranD? As currently, all code/models are for fine-tuning the pre-trained models. I would very much like to experiment with training from scratch! 😄

Thank you again!

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @remvanthull,

We've pre-trained the model using the GranD dataset along with some open-source datasets.

From GranD, we cover:

  1. Referring Expression Segmentation: GrandReferRegDataset
  2. Region-Level Captioning: GrandReferSegmDataset
  3. Short Captioning: GrandShortCaptionDataset
  4. Captioning with Groundings: Refer to the GCG dataset for the dataset class. The annotations are the same as you will prepare for Short captioning - using prepare_grand_caption_grounding.py.
  5. Object-Level Segmentation: Refer to the SemanticSegmentation dataset for the dataset class. The annotations can be prepred using the prepare_object_lvl_data.py.

From Open Source Datasets, we use:

  1. For Region Understanding: COCO-2017, RefCOCO, RefCOCO+ (Similar to the pretraining of GPT4RoI)
  2. For Segmentation: We use the datasets defined in SemanticSegmentation datasets.
  3. For Instruction Following: LLaVA Instruct 150k.

If you encounter any challenges while configuring these datasets, or if you have specific questions related to any of the details provided, please feel free to reach out. Thank you.

from groundinglmm.

remvanthull avatar remvanthull commented on August 10, 2024

Hi @hanoonaR

Thank you for responding so quickly once again! :)

What I was wondering is what checkpoint you used to pre-train the model, aka what I should set the —version to, so that I can replicate this pre-training myself; the checkpoints in the model zoo are all already pre-trained, but I’d like to experiment with pre-training :)

Thank you,

Rachel

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

We initialize from LLavA 1.5 (as that's our LLM model).

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @remvanthull,

Sorry, I forgot to mention that, you will need to set --pretrained as False. Please let me know if you face any issues in loading the model from LLaVA-1.5 or setting up any dataset. Thank you.

from groundinglmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.