Good Morning. How to visualize the predicted caption?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="use

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Caption Generation,about cuhksz-nlp/r2gen

Comments (30)

zaaachos commented on August 10, 2024 1

@jainnipun11 It's easy. You have to go on this line -->

R2Gen/modules/trainer.py

Line 204 in 5a2e63f

self.model.eval()

as well as in the eval mode of the test set respectively.

Below I will write the new updated code only for the validation set. You have to add the same for the test set (if you want to check the predicted captions for the test set). This code snippet saves the image ids along with their captions in each epoch.

self.model.eval()
with torch.no_grad():
    val_gts, val_res = [], []
    val_images_list = list()
    for batch_idx, (images_id, images, reports_ids, reports_masks) in enumerate(self.val_dataloader):
        images, reports_ids, reports_masks = images.to(self.device), reports_ids.to(
            self.device), reports_masks.to(self.device)
        output = self.model(images, mode='sample')
        reports = self.model.tokenizer.decode_batch(output.cpu().numpy())
        ground_truths = self.model.tokenizer.decode_batch(reports_ids[:, 1:].cpu().numpy())
        val_res.extend(reports)
        val_gts.extend(ground_truths)
        val_images_list.extend(images_id)
    val_met = self.metric_ftns({i: [gt] for i, gt in enumerate(val_images_list, val_gts)},
                               {i: [re] for i, re in enumerate(val_images_list, val_res)})
    log.update(**{'val_' + k: v for k, v in val_met.items()})

Then go to the metrics.py where the self.metric_ftns is being called for each image_id, report pair.

Add in the following code snippet in this line -->

R2Gen/modules/metrics.py

Line 4 in 5a2e63f

import pandas as pd

Then you have to open a new terminal to install the pandas library. Just write the following:
pip3 install pandas (or pip install pandas if you have pip version < 3)

Lastly, add the following code snippets in this line -->

R2Gen/modules/metrics.py

Line 14 in 5a2e63f

# first save the ground truth captions
df = pd.DataFrame.from_dict(gts, orient="index")
df.to_csv('path/to/your/folder/gold.csv', sep='\t', header=False)

# then save the predicted ones    
df = pd.DataFrame.from_dict(res, orient="index")
df.to_csv('path/to/your/folder/pred.csv', sep='\t', header=False)

The aforementioned code snippet saves all image_id, predicted pairs as well as image_id, golds pairs in a CSV format.

DONE!

from r2gen.

jainnipun11 commented on August 10, 2024 1

@zaaachos Okay, sure I will! Thanks for the help.

from r2gen.

zaaachos commented on August 10, 2024 1

@jainnipun11 I had replied to that question (regarding base model withouth RM) and I told you to ask authors. I hadn't tried that.

from r2gen.

Liqq1 commented on August 10, 2024 1

@Liqq1 Hey there 😊

I am not the author of this paper, thus I don't know what's this about. I have only used this for my research. Likewise, I think it's more appropriate to ask them directly (send an email, as well)!

No, due to the fact that they did not use the whole IU-Xray dataset as I did on my research. If you used the pretrained weights on THEIR provided test set, without any modification on the model (even the parameters they use as well as the random seed), I can not explain this difference. It's weird actually..

Thanks for your reply!😄
The author probably didn't follow the 'issue' of his project, so I couldn't confirm it with him on github. You're right. I think I should send him an email to confirm.

from r2gen.

Liqq1 commented on August 10, 2024 1

@jainnipun11 I had replied to that question (regarding base model withouth RM) and I told you to ask authors. I hadn't tried that.

Hi! I'm sorry to bother you. I hope to get your help.

Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.

Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why?

hello~May I ask if the first question you asked has been resolved? Why is the effect of reproducing IU so high?

Hello,
At the moment, I haven't received a definitive answer. I suspect that the authors may have rerun the experiments later and achieved better results. In my view, such a scenario is plausible since the metrics for this task are highly volatile, and even the same model can yield very different outcomes with varying parameters or on different machines.

from r2gen.

zaaachos commented on August 10, 2024 1

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

As far as I remember the data consists of 371,920 X-rays, with 227,943 cases and 14 classes. Unfortunately, I no longer have access to the uni server's and I can not see the exact file size.

You could find information though from this link

from r2gen.

zaaachos commented on August 10, 2024

@jainnipun11 Do you mean to print the predicted caption in each step for each image? Or to store the predicted captions in a CSV (or a file format you want) and then observe the predictions one by one?

from r2gen.

jainnipun11 commented on August 10, 2024

Hey! Either way will work, I just want to see the predicted captions corresponding to the actual one. Please guide me on how to get them by any one way you mentioned. Thanks.

from r2gen.

jainnipun11 commented on August 10, 2024

Hey @zaaachos, I followed your instructions but received the following error:

/usr/local/lib/python3.7/dist-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/usr/local/lib/python3.7/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=ResNet101_Weights.IMAGENET1K_V1. You can also use weights=ResNet101_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
/content/R2Gen/modules/caption_model.py:73: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
beam_ix = ix // vocab_size # Nxb which beam
Traceback (most recent call last):
File "main.py", line 124, in
main()
File "main.py", line 120, in main
trainer.train()
File "/content/R2Gen/modules/trainer.py", line 54, in train
result = self._train_epoch(epoch)
File "/content/R2Gen/modules/trainer.py", line 218, in _train_epoch
val_met = self.metric_ftns({i: [gt] for i, gt in enumerate(val_images_list, val_gts)},

TypeError: 'list' object cannot be interpreted as an integer

What else do I modify? Thanks.

from r2gen.

zaaachos commented on August 10, 2024

@jainnipun11 my fault sorry. You have to change each enumerate to zip like this.

zip(val_images_list, val_gts)

from r2gen.

jainnipun11 commented on August 10, 2024

Hey @zaaachos, your above suggestions worked amazingly, thank you! Just one more thing, if I want to you MobileNet as a pretrained model, what changes do I have to do? Can you please guide me on that.

Thanks,
Nipun Jain

from r2gen.

zaaachos commented on August 10, 2024

@jainnipun11 Firsty you need to check from the official Pytorch documentation of MobileNet whether it is available for your torchvision version or not.

Secondly, you need to check if the CUDA version of your GPU (if you run on GPU) is compatible with the Torch version of the MobileNet. I had some issues when I run this code on some CNN models due to the CUDA version of my GPU. One CNN that indeed works well with this code was DenseNet-121. I will write down the code block you need to change in order to experiment with different models except ResNet101.

First comment-out lines from 11 to 14 on this py file -->

R2Gen/modules/visual_extractor.py

Line 11 in 5a2e63f

model = getattr(models, self.visual_extractor)(pretrained=self.pretrained)

and add these code block instead:

model = models.densenet121(pretrained = True)     # for DenseNet-121
newmodel = torch.nn.Sequential(*(list(model.children())[:-1]))
self.model = newmodel
self.avg_fnt = torch.nn.AvgPool2d(kernel_size=7, stride=1, padding=0)

if you want to experiment with more models please check this link .

Lastly, you need to change the parameters of the embeddings depending on the image encoder you will use.
You have to change this number on this line -->

R2Gen/main.py

Line 34 in 5a2e63f

 parser.add_argument('--d_vf', type=int, default=2048, help='the dimension of the patch features.') 

Here 2048 is the dimension of the last average pooling layer of the ResNet101.
For example, for DenseNet-121 you have to substitute it with 1024

from r2gen.

jainnipun11 commented on August 10, 2024

Hey @zaaachos, thanks for the above steps, they were really helpful. I used MobileNet and was able to get fine results following your words. I'm also trying to run the model without MCLN+RM i.e with just the base. Can you guide me how to do that?

Thanks & Regards,
Nipun

from r2gen.

zaaachos commented on August 10, 2024

@jainnipun11 hey 👋. I haven't experienced with this step before. Thus, contact with the authors

from r2gen.

jainnipun11 commented on August 10, 2024

Hey @zaaachos , have you tried running the base model? Can you help with that.

Thanks,
Nipun Jain

from r2gen.

zaaachos commented on August 10, 2024

@jainnipun11 hello. You have to download the weights, and find the argument, which loads the path of this weights file, in main.py. Add the file path as default.

More particularly, here -->

R2Gen/main.py

Line 83 in 5a2e63f

 parser.add_argument('--resume', type=str, help='whether to resume the training from existing checkpoints.') 

from r2gen.

jainnipun11 commented on August 10, 2024

Hey @zaaachos, by base model I meant without Relational Memory. Downloading the weights won't solve this issue right? I mean logically it would still contain weights of parameters involved with RM. Am I getting this right?

Thanks,
Nipun

from r2gen.

Liqq1 commented on August 10, 2024

@jainnipun11 I had replied to that question (regarding base model withouth RM) and I told you to ask authors. I hadn't tried that.

Hi! I'm sorry to bother you. I hope to get your help.

Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.
Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why?

from r2gen.

zaaachos commented on August 10, 2024

@Liqq1 Hey there 😊

I am not the author of this paper, thus I don't know what's this about. I have only used this for my research. Likewise, I think it's more appropriate to ask them directly (send an email, as well)!
No, due to the fact that they did not use the whole IU-Xray dataset as I did on my research. If you used the pretrained weights on THEIR provided test set, without any modification on the model (even the parameters they use as well as the random seed), I can not explain this difference. It's weird actually..

from r2gen.

Ammexm commented on August 10, 2024

@jainnipun11 I had replied to that question (regarding base model withouth RM) and I told you to ask authors. I hadn't tried that.

Hi! I'm sorry to bother you. I hope to get your help.

Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.

Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why?

hello~May I ask if the first question you asked has been resolved? Why is the effect of reproducing IU so high?

from r2gen.

zaaachos commented on August 10, 2024

Hey @Ammexm.
Yes indeed. As @Liqq1 said, working on different resources (in particular, stronger ones) can produce better results. Also, the seeder is also an additional parameter you should take into consideration, when you are working with the same data.

from r2gen.

rhyhck commented on August 10, 2024

@jainnipun11 Do you mean to print the predicted caption in each step for each image? Or to store the predicted captions in a CSV (or a file format you want) and then observe the predictions one by one?

hi~ Do you know how to get the MIMIC-cxr dataset？I try to get it by this"wget -r -N -c -np --user renhongyi --ask-password https://physionet.org/files/mimic-cxr-jpg/2.0.0/" but I can't get it ,as below:
"Resolving physionet.org (physionet.org)... 18.18.42.54
Connecting to physionet.org (physionet.org)|18.18.42.54|:443... failed: Connection timed out.
Retrying."
can you help me? thanks!!

from r2gen.

zaaachos commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

from r2gen.

rhyhck commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

from r2gen.

rhyhck commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

As far as I remember the data consists of 371,920 X-rays, with 227,943 cases and 14 classes. Unfortunately, I no longer have access to the uni server's and I can not see the exact file size.

You could find information though from this link

OK! Thank you for your help~

from r2gen.

rhyhck commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

As far as I remember the data consists of 371,920 X-rays, with 227,943 cases and 14 classes. Unfortunately, I no longer have access to the uni server's and I can not see the exact file size.

You could find information though from this link

Hello, if I want to take any image as input and output the corresponding report through R2Gen, how can I achieve it? How can I avoid training and directly use ready-made image testing?

from r2gen.

zaaachos commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

As far as I remember the data consists of 371,920 X-rays, with 227,943 cases and 14 classes. Unfortunately, I no longer have access to the uni server's and I can not see the exact file size.
You could find information though from this link

Hello, if I want to take any image as input and output the corresponding report through R2Gen, how can I achieve it? How can I avoid training and directly use ready-made image testing?

You will find the answer at the beginning of this thread. Firstly, you have to download the weights and load the path of it, directly at the args parameter of the main.py. Then, you have to initialize a single image dataset if you want to test it out, or just use naive Pytorch with inference stage.

A great course for Pytorch I recently finished is . You will find great explanation for training and testing you are requesting

from r2gen.

rhyhck commented on August 10, 2024

@rhyhck We had a pre-downloaded version of the MIMIC dataset in my university servers when we worked for our academic research, so I don't have any information here, sorry. However, I think it is accessible with an account only, from their official website.

OK！I have got an account, but my net is not so stable that I dont't know whether the dataset is completely downloaded. Could you tell me the size of the every file? especially the named "files" in the dataset. Thank you very much!

As far as I remember the data consists of 371,920 X-rays, with 227,943 cases and 14 classes. Unfortunately, I no longer have access to the uni server's and I can not see the exact file size.
You could find information though from this link

Hello, if I want to take any image as input and output the corresponding report through R2Gen, how can I achieve it? How can I avoid training and directly use ready-made image testing?

You will find the answer at the beginning of this thread. Firstly, you have to download the weights and load the path of it, directly at the args parameter of the main.py. Then, you have to initialize a single image dataset if you want to test it out, or just use naive Pytorch with inference stage.

A great course for Pytorch I recently finished is . You will find great explanation for training and testing you are requesting

Sorry, I was dealing with an urgent matter a few days ago. Thank you very much for your reply. I tried the method you mentioned earlier on this line, but encountered some problems. Can you help me take a look? I think email communication is more convenient. May I have your email address? thanks

from r2gen.

rhyhck commented on August 10, 2024

@jainnipun11 It's easy. You have to go on this line -->

R2Gen/modules/trainer.py

Line 204 in 5a2e63f

self.model.eval()

as well as in the eval mode of the test set respectively.
Below I will write the new updated code only for the validation set. You have to add the same for the test set (if you want to check the predicted captions for the test set). This code snippet saves the image ids along with their captions in each epoch.
self.model.eval()
with torch.no_grad():
    val_gts, val_res = [], []
    val_images_list = list()
    for batch_idx, (images_id, images, reports_ids, reports_masks) in enumerate(self.val_dataloader):
        images, reports_ids, reports_masks = images.to(self.device), reports_ids.to(
            self.device), reports_masks.to(self.device)
        output = self.model(images, mode='sample')
        reports = self.model.tokenizer.decode_batch(output.cpu().numpy())
        ground_truths = self.model.tokenizer.decode_batch(reports_ids[:, 1:].cpu().numpy())
        val_res.extend(reports)
        val_gts.extend(ground_truths)
        val_images_list.extend(images_id)
    val_met = self.metric_ftns({i: [gt] for i, gt in enumerate(val_images_list, val_gts)},
                               {i: [re] for i, re in enumerate(val_images_list, val_res)})
    log.update(**{'val_' + k: v for k, v in val_met.items()})
Then go to the metrics.py where the self.metric_ftns is being called for each image_id, report pair.

Add in the following code snippet in this line -->

R2Gen/modules/metrics.py

Line 4 in 5a2e63f
import pandas as pd
Then you have to open a new terminal to install the pandas library. Just write the following: pip3 install pandas (or pip install pandas if you have pip version < 3)

Lastly, add the following code snippets in this line -->

R2Gen/modules/metrics.py

Line 14 in 5a2e63f
# first save the ground truth captions
df = pd.DataFrame.from_dict(gts, orient="index")
df.to_csv('path/to/your/folder/gold.csv', sep='\t', header=False)

# then save the predicted ones    
df = pd.DataFrame.from_dict(res, orient="index")
df.to_csv('path/to/your/folder/pred.csv', sep='\t', header=False)
The aforementioned code snippet saves all image_id, predicted pairs as well as image_id, golds pairs in a CSV format.

DONE!

Hello, after adding the above code according to your prompt, I encountered the following problem. Can you tell me how to modify it? thanks。

Loading checkpoint: model_iu_xray.pth ...
Checkpoint loaded. Resume training from epoch 15
Traceback (most recent call last):
File "main.py", line 125, in
main()
File "main.py", line 121, in main
trainer.train()
File "/data/renhongyi/R2Gen/R2Gen-main/modules/trainer.py", line 54, in train
result = self._train_epoch(epoch) #返回训练、验证、测试集之后的log
File "/data/renhongyi/R2Gen/R2Gen-main/modules/trainer.py", line 229, in _train_epoch
{i: [re] for i, re in enumerate(zip(val_images_list,val_res))})
File "/data/renhongyi/R2Gen/R2Gen-main/modules/metrics.py", line 33, in compute_scores
score, scores = scorer.compute_score(gts, res, verbose=0)
File "/data/renhongyi/R2Gen/R2Gen-main/pycocoevalcap/bleu/bleu.py", line 49, in compute_score
bleu_scorer += (hypo[0], ref)
File "/data/renhongyi/R2Gen/R2Gen-main/pycocoevalcap/bleu/bleu_scorer.py", line 171, in iadd
self.cook_append(other[0], other[1])
File "/data/renhongyi/R2Gen/R2Gen-main/pycocoevalcap/bleu/bleu_scorer.py", line 118, in cook_append
self.crefs.append(cook_refs(refs))
File "/data/renhongyi/R2Gen/R2Gen-main/pycocoevalcap/bleu/bleu_scorer.py", line 45, in cook_refs
rl, counts = precook(ref, n)
File "/data/renhongyi/R2Gen/R2Gen-main/pycocoevalcap/bleu/bleu_scorer.py", line 29, in precook
words = s.split()
AttributeError: 'tuple' object has no attribute 'split'

from r2gen.

zaaachos commented on August 10, 2024

@rhyhck You have to provide me more details about your problem. I suppose something is missing from your data

from r2gen.

Caption Generation about r2gen HOT 30 OPEN

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent