Giter VIP home page Giter VIP logo

medclip's Introduction

medclip's People

Contributors

ryanwangzf avatar zzachw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

medclip's Issues

The performance of zero-shot classification result on COVID-19 and RSNA didn‘t achieve the desired results

I just reproduce the MedCLIP with MIMIC and chexpert(training dataset), on Chexpert-5*200, the results demonstrate the ideal results. But whe I do zero-shot classification on COVID-19 and RSNA, the ACC of two dataset is only 0.45 and 0.43. I think there is something wrong with my data segmentation or data processing. Can you supply the data processing script and data split ?
Really thanks !

Inquries about the text encoder's pretrained weight

I apologize for inquiring about the code released a few years ago, but it appears that the text encoder directly loads the weights from BioClinicalBERT rather than being pretrained through contrastive learning. From my understanding, the text encoder was trained without being frozen. Are there no available pretrained weights for the text encoder of MedCLIP other than those from BioClinicalBERT?

Model weights don't seem to load correctly while running the example

The results I got also seem to be different from expectation. I am wondering if it's something to do with the below warning.
Here is the logit, which is completely different from the example shown in README.

{'logits': tensor([[0.3603, 0.4735, 0.1625, 0.2380, 0.3830]], device='cuda:0',
       grad_fn=<StackBackward0>), 'class_names': ['Atelectasis', 'Cardiomegaly', 'Consolidation', 'Edema', 'Pleural Effusion']}
Some weights of the model checkpoint at microsoft/swin-tiny-patch4-window7-224 were not used when initializing SwinModel: ['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing SwinModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing SwinModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at emilyalsentzer/Bio_ClinicalBERT were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

pretrained weight

hi, may i ask if mode.from_pretrained does download a checkpoint that is pretrained on ROCO?
coz I found the model cannot even differentiate a dog from an x-ray image

thx

Embedding running out of GPU memory

Hi,
first of all: thanks for creating MedCLIP. It seems to be an amazing library.
I'd like to embed several hundreds of images with the MedCLIPProcessor. However my GPU memory filled up rather fast. That's why I needed to copy each and every embedding to the CPU memory. This is of course rather slow.
I tried to start the embedding on the CPU, but the input tensors (cuda.tensors) and weight tensors(torch.tensors) are not compatible with each other.

Is there a way to run the MedCLIPProcessor on batches of images?
Is there a way to force the input tensors to normal torch.tensors?
Is there a way to actually run the embedding process on a CPU?

Best,
Michael

About using pretrained MedCLIP for fine-tuning on downstream tasks

hello,
First, thanks for creating MedCLIP. It's a great job.
In your published paper on this work, experiments were done and the answer to Q3: Does MedCLIP lead to better performance and label efficiency for downstream classification tasks through fine-tuning? I want to use the MedCLIP model to fine-tune my dataset downstream. Can you share a copy of the code for fine-tuning on a downstream task? Thank you very much!

best wishes,
Hu Tian

Does Medclip only performs better on X-ray modality?

In the paper, the pre-training and evaluation datasets all were of X-ray modality. Do you have any information about how it will perform on other radiology modalities like Mammographs, MRI?
Can you share any evaluations of that sort?

Cannot run example code in colab

Would love help fixing this - I can't run the example CLIP inference code on the example image on a clean google colab.

I opened a brand new colab, uploaded the example image into the colab filesystem, and ran the following code:

  • !pip install medclip
  • The example code for inference on the example image

I get the following error:

AttributeError                            Traceback (most recent call last)
[<ipython-input-2-d97046b8898c>](https://localhost:8080/#) in <module>
      6 processor = MedCLIPProcessor()
      7 image = Image.open('./view1_frontal.jpg')
----> 8 inputs = processor(
      9     text=["lungs remain severely hyperinflated with upper lobe emphysema", 
     10         "opacity left costophrenic angle is new since prior exam ___ represent some loculated fluid cavitation unlikely"], 

2 frames
[/usr/local/lib/python3.8/dist-packages/medclip/dataset.py](https://localhost:8080/#) in <listcomp>(.0)
    103         # transformations (convert rgb + resizing + center cropping + normalization)
    104         if self.do_convert_rgb:
--> 105             images = [self.convert_rgb(image) for image in images]
    106 
    107         if self.do_pad_square:

AttributeError: 'MedCLIPFeatureExtractor' object has no attribute 'convert_rgb'

I solved this and a subsequent padding error by turning off rgb conversion and padding:

processor.feature_extractor.do_convert_rgb = False
processor.feature_extractor.do_pad_square = False

I convert the grayscale example image into RGB manually using example_image.convert('RGB'), converting it to numpy, and moving the channels dim from last dim to first dim because model expects channels-first.

But then we get a new issue where all the pixel_values become inf and nan after the processor is called on the inputs.

Does model support different language?

Firstly, thanks for your great job!! It brought me a huge convenience when using it in downstream tasks.

Besides, I want to know does your model support different languages when dealing with diagnosis, like Chinese for example? Or it can only be specific in English? If not, is there any ways to finetune the model to be suited for Chinese diagnosis?

Hoping for your reply soon!

about sentence label

Thank you for share your code. I find there is "na" in the extracted label, which means some diseases not mentioned in a sentence. Would you tell me how to process it when calculate label similarity?
Waiting for your reply
Thanks Sincerely

Some questions about the test code

👋Hi! Thanks for your code, and I have a question to confirm.
When I run "As simple as using CLIP" in Readme, I try to output the values in the inputs, but I find that the values in the 'pixel_values' matrix are exactly the same (as shown in Figure 1), I wonder if this is normal? What causes this ?

299f520756addc314b6ccf440a066682

problems when run demo code

When I tried the As simple as using CLIP you provided, I got the following error

Traceback (most recent call last):
  File "main.py", line 26, in <module>
    outputs = model(**inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/MedCLIP/medclip/modeling_medclip.py", line 215, in forward
    img_embeds = self.encode_image(pixel_values)
  File "/root/MedCLIP/medclip/modeling_medclip.py", line 199, in encode_image
    vision_output = self.vision_model(pixel_values=pixel_values)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/MedCLIP/medclip/modeling_medclip.py", line 127, in forward
    img_embeds = self.projection_head(img_embeds)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Waiting for your reply
Thanks Sincerely

problems on training from scratch

Thanks a lot for sharing your code here. I am new in the field and I am trying to train MedCLIP from scratch.
It would be really helpful if you could possibly tell me how to preprocess MIMIC-CXR and chexpert data, like how did you get the mimic-csr-train-meta.csv file and how did you organize the images. Or could you please release you data preprocess scripts on github?

Thanks again for your work.

Question about obtaining sentence-label.csv

Hello,

Thanks for your amazing work!

May I ask how to obtain the sentence-label.csv from the MIMIC-CXR dataset? Are you using the concatenation of impression and findings as a whole sentence? Or randomly select one sentence for each report and generate labels? I am a little confused about this after looking at the _preprocess_sentence_label function,

def _preprocess_sentence_label(self):

Thanks a lot!

The model weights do not seem to be loaded correctly, and the classification accuracy of the pre-trained models in cheXpert varies greatly, with the average F1 of 0.42

Using the 5-category demo in the example, setting the logit threshold of 0.50, and using 500 frontal X-ray images from the cheXpert public test set for testing, the following test results are as follows:
image
In addition, the classification results of the sample images do not match the displayed results:
image
The results should be: {'logits': tensor([[0.5154, 0.4119, 0.2831, 0.2441, 0.4588]]

Training from scratch with Trainer Class

Hi there, I am trying to train a MedCLIP model from scratch on the CheXpert and MIMIC-CXR datasets. I cannot seem to find the training script which calls the Trainer class in order to define the model and dataloaders, and train the model from scratch.

Could you by any chance provide the training script which utilizes the Trainer class?

The version of CheXpert

Thank you for sharing the code!

Since CheXpert V1 has two released versions: [CheXpert-v1.0 Original (~ 439 g)] and [CheXpert - v1.0 Downsampled (~ 11 g)], I would like to know which one do you choose? Also, have you compared the difference of performance between them or have any suggestion about them?

Waiting for your kindly reply.
Many thanks.

Questions about Training/Data

Hey, thank you for sharing your interesting work.

I have a few questions about the training process:

How did you pretrain on chexpert, as far as I am aware, chexpert does not come with text reports like MIMIC does? So am I correctly understanding that during pre training you use only MIMIC text reports to match medical semantic similarity with both MIMIC and chexpert images?

Thanks in advance!

about the sample subset

In paper, you have samples several subsets from the original dataset, e.g., a balanced subset for evaludation from the RSNA Pneumonia dataset.

Would you mind providing all subset info (e.g. indices) so that we might be able to reproduce your work?

Release MIMIC 5x200

Hi! Do you consider releasing your MIMIC zero-shot classification splits?
Thanks.

Resnet Vision model seems random

the VIT model weight is reasonable but the Resnet weight outputs very random results, and always negative logits. Is the provided resnet weight correct?

Pretrained Models do not load with pip release

I installed the latest version of medclip via pip and am trying to load the pretrained models.
However, when loading ResNet or ViT I get the following error

RuntimeError: Error(s) in loading state_dict for MedCLIPModel:
	Unexpected key(s) in state_dict: "text_model.model.embeddings.position_ids".

It works if I install the module directly from git.

Reproduce:

from medclip import MedCLIPModel, MedCLIPVisionModelViT, MedCLIPVisionModel

model = MedCLIPModel(vision_cls=MedCLIPVisionModel)
model.from_pretrained()

logit value different from demo

value in demo_from_pretrained_and_encode.ipynb is 0.2560, 0.6394
but got 0.1729, 0.3899
classify result also wrong
i download weight from url in txt file,other weights are auto downloaded,is this way right?

problems when run demo code

Thanks for your sharing

After downloading pre-trained Model, i run the sample code "As simple as using CLIP" and make no modifications. Then an error appears:

Traceback (most recent call last):
File "D:/Projects/MedCLIP/11.py", line 20, in
outputs = model(**inputs)
File "D:\Programming environment\Python--3.8.9\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Projects\MedCLIP\medclip\modeling_medclip.py", line 216, in forward
logits_per_image = self.compute_logits(img_embeds, text_embeds)
File "D:\Projects\MedCLIP\medclip\modeling_medclip.py", line 230, in compute_logits
logits_per_text = torch.matmul(text_emb, img_emb.t()) * logit_scale
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x768 and 512x1)

I haven't used hugging face before, whether there are problems in the model of bert or other problems?

Waiting for your reply
Thanks Sincerely

about sentence label

Thank you for share your code. I find there is "na" in the extracted label, which means some diseases not mentioned in a sentence. Would you tell me how to process it when calculate label similarity?
Waiting for your reply
Thanks Sincerely

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.