Giter VIP home page Giter VIP logo

Comments (6)

lishen avatar lishen commented on August 25, 2024

@taijizhao ,

The official test set was not available when I did the study so it could not be part of the train set. It is actually more like another hold out set.

Unfortunately, the scores are not as good as the ones on the test set I used. One thing you need to check is when you convert to PNG, the contrast is automatically adjusted. I used "convert -auto-level" to perform the conversion.

I also offer two reasons why the performance is worse:

  1. The official test set is intrinsically more difficult (e.g., more subtle cases) to classify than the test set I used.
  2. The official test set contains cases whose distributions do not bear similarity to the train set I used for model generation.

If you want to improve the scores on the test set, you shall do your own training on the train set.

As a side note (unpublished): I could achieve a single model AUC of 0.85 on the official test set when combining the CC and MLO views. Maybe you can do it even better.

from end2end-all-conv.

xuranzhao711 avatar xuranzhao711 commented on August 25, 2024

@lishen
Thank you very much for your kind explanations! I'll try more.
Just one thing I want to make clear: when converting dicom to png, SHOULD or SHOULD NOT the contrast being adjusted? Actually I did the conversion by dicom and opencv, something like this:

import dicom
import cv2
img = dicom.read_file(dicom_filename)
img = img.pixel_array
img = cv2.resize(img,(896,1152),interpolation=cv2.INTER_CUBIC)
cv2.imwrite(save_path+png_save_name, img)

In this way I think the contrast is not adjusted?
and your comment

I used "convert -auto-level" to perform the conversion.

which python package this "convert auto-level" command is from?
Thank you again!

from end2end-all-conv.

lishen avatar lishen commented on August 25, 2024

@xuranzhao711
The way you converted it there was no contrast adjustment done. It does not matter whether it is right or wrong that you do contrast adjustment. But it's important to be consistent between model training and evaluation.

convert is simply a Linux command from ImageMagik. It is widely available. This the command I used:

convert -auto-level {} -resize 896x1152! ../ConvertedPNGs/{/.}.png

from end2end-all-conv.

yueliukth avatar yueliukth commented on August 25, 2024

Hi @lishen ! I'm trying to reproduce your algorithm on DDSM official train/val/test split, but I observed a relatively large auc gap (around 8%) between val and test sets. So far, the best val auc I achieved is 83% and the test auc on the same model is 75%. I was wondering if you observed the same or at least similar auc gap when you trained and tested on this new official split. Otherwise, I guess maybe it means my model is somehow overfitting. Thank you in advance! Looking forward to your reply.

from end2end-all-conv.

lishen avatar lishen commented on August 25, 2024

@irisliuyue, it's actually common to observe such gap between val and test sets. Sometimes, the val AUC is even lower than the test AUC. It means the val and test sets have different distributions. Unfortunately, it's actually hard to make them more even. If you can afford the computation, simply do multiple splits or use (nested) cross-validation.

from end2end-all-conv.

yueliukth avatar yueliukth commented on August 25, 2024

Hi @lishen, thanks for your reply!

a) What did you mean by multiple splits? Do you suggest mix all train/val/test images and split, or just mix some train/val and split while leaving official test set untouched?

I don't see a huge difference between my train/val auc scores, so my model generalises well on unseen validation (but not unseen test). So I guess if I mix train and val and then do cross-validation, the test performance won't have a huge leap anyway.

b) And you are right, I did notice that sometimes val auc is even lower than test auc, but it's very rare. In general, from my observation mostly my test is around 8% lower than validation. One explanation could be val and test have something systematically different, for example like having different distributions as you said. I did try to plot histograms showing distributions of reading difficulties across train/val/test datasets according to BIRADS assessment, but their distributions are almost identical.

So I was wondering if you have any advice on how to prove different distributions across datasets? Thanks!

from end2end-all-conv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.