nana2929 / medical-vqa Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 811.71 MB

111-2 Medical Image System Term Project

License: MIT License

term-project

medical-vqa's Introduction

Medical-VQA

111-2 Medical Image System Term Project

medical-vqa's People

Contributors

Stargazers

Watchers

medical-vqa's Issues

Preprocess Images

Basic Dataset Stats

Note that images in trainset and testset overlap.

Preprocessing Time

HEAD_CT: 36.10 seconds
ABD_CT: 288.61 seconds
CHEST_X-Ray: 58.11 seconds
HEAD_MRI: 1276.83 seconds

Preprocessing Configuration 2023.05.14 16-43

`preprocessing/init.py`

_DEFAULT_PIPELINE_STEPS = {
    'HEAD_CT':
    [(remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN, _CANNY_MAX)),
     (hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['HEAD']),
     (adjust_tilt, 'HEAD')],
    'HEAD_MRI': [
        (remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
                                                         _CANNY_MAX)),
        # (hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['HEAD']),
        (adjust_tilt, 'HEAD'),
        (fcm_norm, _FCM_NORM_VALUE),
    ],
    'ABD_CT': [
        (remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
                                                         _CANNY_MAX)),
        # (hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['ABD']),
        (median_filter, ),
        # (wiener_filter, ),
        # (adjust_tilt, 'ABD')
    ],
    'CHEST_X-Ray': [
        (remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
                                                         _CANNY_MAX)),
        (norm, ),
        (create_clahe, 1, 16),
        (gauss_blur, ),
        # (adjust_tilt, 'CHEST')
    ]
}

`preprocessing/config.py`

_DEFAULT_HU_TRANSFORM_PARAMS = {
    'HEAD': (48, 68),
    'ABD': (70, 104)
}
# define the low-quality images
# see preprocess branch: image-inspection/measure_ct_noise.py
# Q4 images
_CT_NOISE_FUNCS = [
    'median_filter',
    'wiener_filter',
]
_TO_APPLY_ABD_FILTERS = [
    'synpic23631.jpg', 'synpic41050.jpg', 'synpic32136.jpg', 'synpic22791.jpg',
    'synpic19605.jpg', 'synpic26697.jpg', 'synpic21902.jpg', 'synpic42157.jpg',
    'synpic16520.jpg', 'synpic40596.jpg', 'synpic48714.jpg', 'synpic46943.jpg',
    'synpic33889.jpg', 'synpic23571.jpg', 'synpic23008.jpg', 'synpic28180.jpg',
    'synpic42951.jpg', 'synpic33844.jpg', 'synpic54823.jpg', 'synpic38630.jpg',
    'synpic26158.jpg', 'synpic22684.jpg', 'synpic22982.jpg', 'synpic22020.jpg',
    'synpic58261.jpg', 'synpic45914.jpg', 'synpic34922.jpg', 'synpic28695.jpg',
    'synpic43433.jpg', 'synpic21028.jpg', 'synpic29219.jpg', 'synpic24967.jpg',
    'synpic24220.jpg'
]
_MORPHOLOGY_KERNEL = np.ones((5, 5), np.uint8)
_GAUSS_VALUE = 5
_CANNY_MIN = 230
_CANNY_MAX = 250
_FCM_NORM_VALUE = 0.8

Note

Some Head_MRI imgs do not have clear enough contour for adjust_tilt
to perform, hence these images are skipped for this operation.
(> 25 photos)
Need to execute lib/utils/create_resized_images.py according to lib/utils.run.sh again after preprocessing the images. Beware of the channel number.

[Bugfix] Label equality metrics

Issue Description

問題：csv 檔算出來的分數比 inference script 少一大截。
Potential Pitfall: 他的 label encode 之前做的處理比較多，我有些沒有做

將 label 轉換為 text 後才進行分數計算

	Train	Val
Answer Acc	91.22%	67.63%

我用的 quality check:

# calculate accuracy
def check_eq(r: pd.Series):
    """
    TODO: remove puncts
    """
    a = r['answer']
    p = r['predicted_answer'] if 'predicted_answer' in r else r['pred_answer']
    # if any of them is nan, turn to empty string
    if type(a) == float:
        a = ''
    if type(p) == float:
        p = ''
    a = a.lower()
    p = p.lower()
    return a==p

Script 跑完後算出來的分數（估計是 Paper Report 用的數據）

451 181.0 270.0 # open, close
[Validate] Acc:71.175163% | Open_ACC:56.906078% | Close_ACC:80.740738%
3064 1251.0 1813.0
[Train] Acc:98.727158% | Open_ACC:97.921661% | Close_ACC:99.282951%

[Survey] Visual Encoders and CLIP

CLIP

使成對 image embedding 與 text embedding 越靠近越好（contrastive learning）

Clip 內的訓練：

text encoder: 統一採用 gpt-2 內的 transformers 結構，image encoder 則可以用 ResNet 50, 101, 50x4, 50x64, Vit-B/32, Vit-B/16, ... 等等。

_MODELS = {
    "RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
    "RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
    "RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
    "RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
    "RN50x64": "https://openaipublic.azureedge.net/clip/models/be1cfb55d75a9666199fb2206c106743da0f6468c9d327f3e0d0a543a9919d9c/RN50x64.pt",
    "ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
    "ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
    "ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt",
}

[Image Preproc] Tilt Correction

preprocessing for CT

Map trainset, testset answers back to each image to see consistency across paraphrases

451 181.0 270.0 # open, close
[Validate] Val_Acc:71.175163% | Open_ACC:56.906078% | Close_ACC:80.740738%
3064 1251.0 1813.0
[Train] Val_Acc:98.727158% | Open_ACC:97.921661% | Close_ACC:99.282951%

[Experiment] Basic Image Preprocessing x QCRPubMedCLIP

前置作業：紀錄未經特殊處理的圖片的 VQA 正確率

Run with Visual Encoder RN50x4 and record acc
Run with Visual Encoder ViT and record acc

使用經過特殊處理的圖片跑三個實驗

#9
Record the preprocessing steps for images (and the rationale) behind @liu @zhihao
Run with Visual Encoder RN50 (best) and record acc
Run with Visual Encoder RN50x4 and record acc
Run with Visual Encoder ViT and record acc

[Code Tracking] attn module

Diverse Attention

# line: 191 
# diverse Attention -> (open + close)
att_close, _ = self.close_att(v_close,q_close)
att_open, _ = self.open_att(v_open,q_open)
# bilinear residual network
last_output_close = self.close_resnet(v_close,q_close,att_close)
last_output_open = self.open_resnet(v_open,q_open,att_open)

Type Attention

QCR_PubMedCLIP/lib/language/classify_question.py
dot product with question-type attention

last_output_close = last_output_close * typeatt_close
last_output_open = last_output_open * typeatt_open

class typeAttention(nn.Module):
    def __init__(self, size_question, path_init):
        super(typeAttention, self).__init__()
        self.w_emb = WordEmbedding(size_question, 300, 0.0, False)
        self.w_emb.init_embedding(path_init)
        self.q_emb = QuestionEmbedding(300, 1024, 1, False, 0.0, 'GRU')
        self.q_final = QuestionAttention(1024)
        self.f_fc1 = linear(1024, 2048)
        self.f_fc2 = linear(2048, 1024)
        self.f_fc3 = linear(1024, 1024)

    def forward(self, question):
        question = question[0]
        w_emb = self.w_emb(question)
        q_emb = self.q_emb.forward_all(w_emb)  # [batch, q_len, q_dim]
        q_final = self.q_final(w_emb, q_emb)  # b, 1024

        x_f = self.f_fc1(q_final)
        x_f = F.relu(x_f)
        x_f = self.f_fc2(x_f)
        x_f = F.dropout(x_f)
        x_f = F.relu(x_f)
        x_f = self.f_fc3(x_f)

        return x_f

BiAttention

看起來是 textual 的
Open attention
Closed attention
QCR_PubMedCLIP/lib/BAN/multi_level_model.py

# QCR_PubMedCLIP/lib/BAN/multi_level_model.py
# Create BAN model
class BAN_Model(nn.Module):
    def __init__(self, dataset, cfg, device):
        super(BAN_Model, self).__init__()

        self.cfg = cfg
        self.dataset = dataset
        self.device = device
        # init word embedding module, question embedding module, biAttention network, bi_residual network, and classifier
        self.w_emb = WordEmbedding(dataset.dictionary.ntoken, 300, .0, cfg.TRAIN.QUESTION.CAT)
        self.q_emb = QuestionEmbedding(600 if cfg.TRAIN.QUESTION.CAT else 300, cfg.TRAIN.QUESTION.HID_DIM, 1, False, .0, cfg.TRAIN.QUESTION.RNN)

        # for close att+ resnet + classify
        self.close_att = BiAttention(dataset.v_dim, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.ATTENTION.GLIMPSE)
        self.close_resnet = BiResNet(cfg, dataset)


        self.close_classifier = SimpleClassifier(cfg.TRAIN.QUESTION.CLS_HID_DIM, cfg.TRAIN.QUESTION.CLS_HID_DIM * 2, dataset.num_close_candidates, cfg)

        # for open_att + resnet + classify
        self.open_att = BiAttention(dataset.v_dim, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.ATTENTION.GLIMPSE)
        self.open_resnet = BiResNet(cfg, dataset)

[Image Preproc] Packing into a module

Main Contributor: @zhiao777774

[Preprocessing Survey] Abdominal CT

[Code Tracking]

QCR_PubMedCLIP/main/main.py

# line 75 
glove_weights_path = os.path.join(data_dir, "glove6b_init_300d.npy")
question_classify = classify_model(d.ntoken, glove_weights_path)
if cfg.DATASET.DATASET == "SLAKE":
    ckpt = './saved_models/type_classifier_slake.pth'
    pretrained_model = torch.load(ckpt, map_location='cuda:0')['model_state']
else:
    ckpt = './saved_models/type_classifier.pth'
    qtype_ckpt = './saved_models/qtype_classifier.pth'
    pretrained_model = torch.load(ckpt, map_location='cuda:0')
question_classify.load_state_dict(pretrained_model)

但他好像沒有用到 qtype_ckpt 了，這顆 pretrained ckpts 在 saved_models 內沒有。