postbg / cosmo.pytorch Goto Github PK

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback presented in CVPR 2021.

License: Other

Python 92.47% Jupyter Notebook 7.53%

cosmo.pytorch's Introduction

cosmo.pytorch's People

Contributors

Stargazers

Watchers

Forkers

cosmo.pytorch's Issues

About other two datasets Shoes and Fashion200K

Hi, @numpee.
Could you please release the data processing method of the other two datasets ? I want to know to train the model on Shoes and Fashion200K. Thank you.

Why image is in png format?

CosMo.pytorch/data/fashionIQ.py

Lines 24 to 25 in 768b7b8

 def _create_img_path_from_id(root, id): 

 return os.path.join(root, '{}.png'.format(id))

Hi, @postBG . Thanks for your great work.

I guess the original images in FashionIQ dataset is in JPG format. While your code is reading images as PNG and thus leads to an error.

Do you have any pre-processing?

Hi，
I am very interested in this task.
However, I try to download the dress data of FasionIQ dataset but found that about 905 image URLs are missing. I finally get 18182 dress image data. I hope to know whether you get a complete dataset and how many images are included in the dataset.
What's more, can you send FashionIQ to my Gmali：[email protected].
Thanks.

LSTM hidden size

In text_encoedrs/lstm.py, the hidden size of LSTM is defined as:
lstm_hidden_size = kwargs.get('lstm_hidden_size', 512)

But in text_encoders/init.py, the desired lstm hidden size (from config) is not used.

Can you check at this? Thanks.

Problems about Fashion200k dataset

Hi, there.

Thank you for the great work!

According to fashion200k repo, there are two versions of Fashion200k pictures, namely cropped detected images and original images, which one do you use in your implementation?

problem with fashioniq dataset in the drive

hello. As mentioned in the paper there must be around 77000 images in this dataset. but the dataset in this drive contains around 74000 images. this causes problems in comparing results in academic research. do you have access to the full dataset?

The time to train the model

Hi, I want to know that how long it took you to train the model on three datasets?
Thank you

Fashion200k result

Hello, I have some misunderstandings with the fashion200k dataset.

My reproduced result is -7~8% lower than the reported one of fashion200k, so I have several questions I wish you could answer for me.

Does the gallery(database) contain all the images from data/labels/xxx_test_detect_all.txt?
Are query images from data/test_queries.txt?
In your paper and Github page, you said that the modifier should be "Change A to B". However, we find that it's actually "Replace A with B" in the code link you provided. Does it have a negative impact?
The target image is not unique when testing fashion200k (, which is different from shoes and fashion-IQ). Is it correct to write the code of the fashion200k dataset part following fashion_IQ? Do there exist additional modifications?
Could you please release the code of these three datasets.

These questions have been sent to you by email. I'm afraid that you are too busy to notice my email, so I raise this issue. I wish you could choose one to reply at your convenience. Thanks a lot.

Concatenate two captions?

Hi, @numpee

I also found a little different experimental setting between CosMo and other methods.

As shown in official FashionIQ evaluation codebase: the two captions of each triplet pair are concatenated into one sentence.

Following this setting, many other methods concatenate two captions while VAL doesn't.

I guess you don't concatenate two captions in order to make a fair comparison with VAL.

However, I guess you may receive higher performance with following this setting.

Actual differences between FashionIQ evaluation method and VAL evaluation method

Hi!
In the README file is pointed out that the evaluation method reported in the paper is slightly different from evaluation method of the original FashionIQ dataset (in order to match the method used by VAL).

However, I can't quite figure out which are the actual differences between the two methods.
Can someone explain them to me in detail?
Thanks, Alberto

Could you share the Shoes dataset?

Hi,

Thank you for the great work! I found the the original shoes dataset's link is broken, could you share the dataset? Thank you very much.

Different results from provided notebook.

Hi, @postBG . Thanks for your great work.

I am preparing vocab according to the instructions in README. However, I received different output results from jupyter_files/how_to_create_fashion_iq_vocab.ipynb.

My third code block's result is:

is solid black with no sleeves and is black with straps
B005X4PL1G

And the result of sixth is:
2957

I guess you intended to load test split to build vocab while the val split is loaded. But I am not sure. Plase give me some hints.

Doubts about the results of TIRG in the paper

Hi @numpee ,

For the results of TIRG on fashion_iq mentioned in main paper Table1 and supp material, did you reproduce several experiments with similar results?

I understand you just copy the results reported in VAL. However, I challenge these results are wrong.

According to my experiment, I can have following performance on original split with TIRG (with ResNet 50 and Bi-GRU, no glove and BERT embeddings were used here):

Shirt R@10	Shirt R@50	Dress R@10	Dress R@50	Toptee R@10	Toptee R@50
18.50	43.03	21.81	46.26	24.02	51.10

This performance is much better than both VAL and my produced CoSMo, also is very close to reported CoSMo.

Also, this paper has the same conclusion with me (although our settings are different, our comparison between VAL and TIRG is fair, please see Table 1 for details): TIRG is much better than VAL and the results reported in VAL are wrong.

If our observations are correct, then how to prove the performance effect of CoSMo (although I totally agree with CoSMo's insight)?

Please point me out if I were wrong. Thanks in advance.

About the result

I test the model in dress dataset, and get the result above. The results do not match the paper and I run it following the command you write. Do you know why ?

I want to confirm one thing about the trainning details

Are you have the same trainning settings with the VAL?

hyperparameter settings

Hi, I ran the example code (fashioniq dataset) with the default configuration provided in the thesis project, but couldn't achieve the results reported in the thesis. I wonder if there is a problem with my hyperparameter settings (adjusted random seeds, learning rate, etc.), the highest can achieve a top@50 accuracy of 44% on the toptee sub-dataset (the value reported in the paper is at 57% about).
If I want to reproduce the experimental results in the paper, could you please give me some suggestions for my experiments.

Seperately trained on FashionIQ subsets?

Hi, @numpee . Another question please.

I wonder did you train three models on FashionIQ dress/toptee/shirt separately?

In other words, are the results shown in Table 1 from one model or three models?

	def _create_img_path_from_id(root, id):
	return os.path.join(root, '{}.png'.format(id))