Hi, I have successfully reproduced your work and got the exact same results as describ

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why does the best model and results reproduced on the iu-xray dataset appear in the 3rd epoch? about xpronet HOT 7 OPEN

Leepoet commented on August 11, 2024

Why does the best model and results reproduced on the iu-xray dataset appear in the 3rd epoch?

from xpronet.

Comments (7)

Xqq2620xx commented on August 11, 2024 2

Maybe you are right, there is a possibility that something in the iu-xray dataset itself is causing this to happen. In response to your answer, I can agree with most of your points. By the way, I also appreciate your patient reply and wonderful work contribution. Thank you.

Hello! I've encountered the same problem as you! I also achieved very good results in the 1st epoch, but the generated sentences are all repetitive. I would like to share my thoughts and discuss them with you:

I have tried R2Gen, R2GenCMN, and XProNet, and their results on IU-XRay were very unstable. (You mentioned that R2GenCMN had the highest value at the 25th epoch, but I also experienced cases where the highest value occurred in the first five epochs). I have also modified my own model and encountered situations where the 1st epoch had very high results.

Currently, everyone(the previous papars) is taking the best validation result, and the evaluation metrics do not include a measure of diversity. I think there might not be a good solution to this problem at the moment. Taking the average of results from all epochs or just using the results from the final epoch doesn't seem appropriate either.

In addition, I found that using LSTM as the decoder, compared to Transformer, can result in better diversity, but I don't understand the specific reasons behind it.

However, on MIMIC-CXR, the above situation was largely alleviated, and the results were relatively stable. At least in the experiments I conducted, I did not encounter cases where the first five epochs had very high results. Perhaps we can explore more on MIMIC-CXR.

I think that we need better and more reasonable metrics to evaluate the ability of radiology report generation models😂～

from xpronet.

Markin-Wang commented on August 11, 2024

Hi, thanks for your interest in our work. Note that for radiology report generation, the precision is more important than the diversity of the reports. For the validity of our method, our work follows the method of the most notable work in this area: we utilized six widely-used evaluation metrics to gauge the performance of our model. We also observed the same phenomenon during our experiments on different models, e.g., R2Gen (see this issue) , R2GenCMN, on IU-Xray dataset. The possible reason could be the IU-Xray has both the frontal and lateral views, hence it's difficult for the visual extractor to capure the diffrence between different samples, hence the model is likely to generate the similar reports. Besides, IU-Xray is a small dataset, hence the diversity itself in the report is smaller than the MIMIC-CXR dataset. Hope this can help you figure out the problem.

from xpronet.

Leepoet commented on August 11, 2024

Hi, thanks for your reply.
I have tried to reproduce the work of R2GenCMN, and finally found that its best model and best results are generated around the 25th epoch, which is within the acceptable range in my opinion.
As I said earlier, the best model obtained in the first few epochs is usually not of reference value. In my recent repro experiments, I got pretty good results in the first epoch, but in the end found that only one kind of report was generated. Does this mean that the model was not well trained in the first few epochs so that the diversity of the generated reports is poor but the six evaluation indicators of the results are good? Further, if this is the case, please forgive my bold doubts, then the validity of the method you propose may lack some convincing.

from xpronet.

Markin-Wang commented on August 11, 2024

Hi, thanks for your reply. I have tried to reproduce the work of R2GenCMN, and finally found that its best model and best results are generated around the 25th epoch, which is within the acceptable range in my opinion. As I said earlier, the best model obtained in the first few epochs is usually not of reference value. In my recent repro experiments, I got pretty good results in the first epoch, but in the end found that only one kind of report was generated. Does this mean that the model was not well trained in the first few epochs so that the diversity of the generated reports is poor but the six evaluation indicators of the results are good? Further, if this is the case, please forgive my bold doubts, then the validity of the method you propose may lack some convincing.

Hi, I guess the epoch the best performance occured is also influenced by the hyper-parameters such as learning rate and the working environment in addition to the method itself. As we mentioned earlier, our work follows the method of the most notable work in this area such as R2Gen and R2GenCMN: we utilized six widely-used evaluation metrics to gauge the performance of our model. In addition, from my perspective, the problem is that the NLP evaluation metrics may not reflect the true performance of the model, which is a common problem in text generation tasks. This is why we normally focus more on the larger dataset such as MIMIC-CXR to mitigate this problem. Moreover, the higher diversity is not always with the higher precision. SME involvement is required to truly gauge this.

from xpronet.

Leepoet commented on August 11, 2024

Maybe you are right, there is a possibility that something in the iu-xray dataset itself is causing this to happen. In response to your answer, I can agree with most of your points. By the way, I also appreciate your patient reply and wonderful work contribution. Thank you.

from xpronet.

Markin-Wang commented on August 11, 2024

Maybe you are right, there is a possibility that something in the iu-xray dataset itself is causing this to happen. In response to your answer, I can agree with most of your points. By the way, I also appreciate your patient reply and wonderful work contribution. Thank you.

Never mind, and thank you for your interest to our work and the conrete discussion. Please feel free to contact again if you have any other questions.

from xpronet.

ThatNight commented on August 11, 2024

@Leepoet Hello Leepoet , I have repeated the experiment many times and it is difficult to get the results of iu-xray dataset. Can you share the parameters of utils.py on the iu-xray dataset, or the random seed?

from xpronet.

Why does the best model and results reproduced on the iu-xray dataset appear in the 3rd epoch? about xpronet HOT 7 OPEN

Comments (7)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent