Giter VIP home page Giter VIP logo

fawnliu / tris Goto Github PK

View Code? Open in Web Editor NEW
59.0 0.0 2.0 3.25 MB

[ICCV 2023] Official code release of our paper "Referring Image Segmentation Using Text Supervision"

Home Page: https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Referring_Image_Segmentation_Using_Text_Supervision_ICCV_2023_paper.pdf

License: MIT License

Python 97.05% Shell 2.95%
referring-image-segmentation weakly-supervised-learning

tris's People

Contributors

fawnliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tris's Issues

Positive ResponseMap Selection

Thank you for your great work. I have a question for you. In the first stage of training, we have divided a positive sample of a text image pair and N negative samples of other text and this image, so why do we still need to select the response graph generated by these sample pairs in stage 1? Why not just select the response graph of the positive sample?

`demo.py` img_size hardcoded to incorrect value

Hi,

Thank you for sharing your work!

I noticed in demo.py, the get_transform() method does not respect the size argument. It is hardcoded to resize to (224,224):

TRIS/demo.py

Line 22 in b45f660

transforms.Resize((224, 224)),

I believe this should be (size,size), which will take 320px by default. I can confirm that this change results in a heatmap that is a lot closer to the example in your readme.


Before fix

heatmap

After fix

heatmap_fixed

I'm still not sure why my heatmap doesn't match your example 100%, but the results are impressive nonetheless. 🙂

Regarding Bilateral Prompt

Thanks authors for sharing the code. I have a following question:

When computing the attention map for visual features, is Av in the below line a all-one tensor? Only one language vector is used as key and the softmax is applied on the last dimension which is 1.

Av = F.softmax(Qv.matmul(Kt.transpose(1, 2)) / math.sqrt(Ci), dim=2)

Too long all_eta

Thank you for your outstanding paper! I tried to retrain your model to use it as my baseline model.

Screenshot from 2023-12-29 17-57-37

This is my current state on training stage1.

I just checked that about 33seconds consumed on loss.backword().

Is it right that all_eta is logged as 5 days?

No such file or directory: '../output/refcocog_umd/refcocog_train_names.json'

run in the order you specified but run these code :

Train IRNet and generate pseudo masks.
cd IRNet

dir=../output
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_sample_refer.py --cam_out_dir $dir/refcocog_umd/cam --ir_label_out_dir $dir/refcocog_umd/ir_label --ins_seg_out_dir $dir/refcocog_umd/ins_seg --train_list $dir/refcocog_umd/refcocog_train_names.json --cam_eval_thres 0.15 --work_space output_refer/refcocog_umd --num_workers 8 --irn_batch_size 96 --cam_to_ir_label_pass True --train_irn_pass True --make_ins_seg_pass True

error: No such file or directory: '../output/refcocog_umd/refcocog_train_names.json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.