Hello, Li, First congratulations for your excellent work and thank you a lot for s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Inconsistent results on DDSM testset about end2end-all-conv HOT 6 OPEN

lishen commented on August 25, 2024

Inconsistent results on DDSM testset

from end2end-all-conv.

Comments (6)

lishen commented on August 25, 2024

@taijizhao ,

The official test set was not available when I did the study so it could not be part of the train set. It is actually more like another hold out set.

Unfortunately, the scores are not as good as the ones on the test set I used. One thing you need to check is when you convert to PNG, the contrast is automatically adjusted. I used "convert -auto-level" to perform the conversion.

I also offer two reasons why the performance is worse:

The official test set is intrinsically more difficult (e.g., more subtle cases) to classify than the test set I used.
The official test set contains cases whose distributions do not bear similarity to the train set I used for model generation.

If you want to improve the scores on the test set, you shall do your own training on the train set.

As a side note (unpublished): I could achieve a single model AUC of 0.85 on the official test set when combining the CC and MLO views. Maybe you can do it even better.

from end2end-all-conv.

xuranzhao711 commented on August 25, 2024

@lishen
Thank you very much for your kind explanations! I'll try more.
Just one thing I want to make clear: when converting dicom to png, SHOULD or SHOULD NOT the contrast being adjusted? Actually I did the conversion by dicom and opencv, something like this:

import dicom
import cv2
img = dicom.read_file(dicom_filename)
img = img.pixel_array
img = cv2.resize(img,(896,1152),interpolation=cv2.INTER_CUBIC)
cv2.imwrite(save_path+png_save_name, img)

In this way I think the contrast is not adjusted?
and your comment

I used "convert -auto-level" to perform the conversion.

which python package this "convert auto-level" command is from?
Thank you again!

from end2end-all-conv.

lishen commented on August 25, 2024

@xuranzhao711
The way you converted it there was no contrast adjustment done. It does not matter whether it is right or wrong that you do contrast adjustment. But it's important to be consistent between model training and evaluation.

convert is simply a Linux command from ImageMagik. It is widely available. This the command I used:

convert -auto-level {} -resize 896x1152! ../ConvertedPNGs/{/.}.png

from end2end-all-conv.

yueliukth commented on August 25, 2024

Hi @lishen ! I'm trying to reproduce your algorithm on DDSM official train/val/test split, but I observed a relatively large auc gap (around 8%) between val and test sets. So far, the best val auc I achieved is 83% and the test auc on the same model is 75%. I was wondering if you observed the same or at least similar auc gap when you trained and tested on this new official split. Otherwise, I guess maybe it means my model is somehow overfitting. Thank you in advance! Looking forward to your reply.

from end2end-all-conv.

lishen commented on August 25, 2024

@irisliuyue, it's actually common to observe such gap between val and test sets. Sometimes, the val AUC is even lower than the test AUC. It means the val and test sets have different distributions. Unfortunately, it's actually hard to make them more even. If you can afford the computation, simply do multiple splits or use (nested) cross-validation.

from end2end-all-conv.

yueliukth commented on August 25, 2024

Hi @lishen, thanks for your reply!

a) What did you mean by multiple splits? Do you suggest mix all train/val/test images and split, or just mix some train/val and split while leaving official test set untouched?

I don't see a huge difference between my train/val auc scores, so my model generalises well on unseen validation (but not unseen test). So I guess if I mix train and val and then do cross-validation, the test performance won't have a huge leap anyway.

b) And you are right, I did notice that sometimes val auc is even lower than test auc, but it's very rare. In general, from my observation mostly my test is around 8% lower than validation. One explanation could be val and test have something systematically different, for example like having different distributions as you said. I did try to plot histograms showing distributions of reading difficulties across train/val/test datasets according to BIRADS assessment, but their distributions are almost identical.

So I was wondering if you have any advice on how to prove different distributions across datasets? Thanks!

from end2end-all-conv.

Inconsistent results on DDSM testset about end2end-all-conv HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent