111-2 Medical Image System Term Project
nana2929 / medical-vqa Goto Github PK
View Code? Open in Web Editor NEW111-2 Medical Image System Term Project
License: MIT License
111-2 Medical Image System Term Project
License: MIT License
Note that images in trainset and testset overlap.
HEAD_CT: 36.10 seconds
ABD_CT: 288.61 seconds
CHEST_X-Ray: 58.11 seconds
HEAD_MRI: 1276.83 seconds
preprocessing/__init__.py
_DEFAULT_PIPELINE_STEPS = {
'HEAD_CT':
[(remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN, _CANNY_MAX)),
(hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['HEAD']),
(adjust_tilt, 'HEAD')],
'HEAD_MRI': [
(remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
_CANNY_MAX)),
# (hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['HEAD']),
(adjust_tilt, 'HEAD'),
(fcm_norm, _FCM_NORM_VALUE),
],
'ABD_CT': [
(remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
_CANNY_MAX)),
# (hu_transform, *_DEFAULT_HU_TRANSFORM_PARAMS['ABD']),
(median_filter, ),
# (wiener_filter, ),
# (adjust_tilt, 'ABD')
],
'CHEST_X-Ray': [
(remove_text, _MORPHOLOGY_KERNEL, _GAUSS_VALUE, (_CANNY_MIN,
_CANNY_MAX)),
(norm, ),
(create_clahe, 1, 16),
(gauss_blur, ),
# (adjust_tilt, 'CHEST')
]
}
preprocessing/config.py
_DEFAULT_HU_TRANSFORM_PARAMS = {
'HEAD': (48, 68),
'ABD': (70, 104)
}
# define the low-quality images
# see preprocess branch: image-inspection/measure_ct_noise.py
# Q4 images
_CT_NOISE_FUNCS = [
'median_filter',
'wiener_filter',
]
_TO_APPLY_ABD_FILTERS = [
'synpic23631.jpg', 'synpic41050.jpg', 'synpic32136.jpg', 'synpic22791.jpg',
'synpic19605.jpg', 'synpic26697.jpg', 'synpic21902.jpg', 'synpic42157.jpg',
'synpic16520.jpg', 'synpic40596.jpg', 'synpic48714.jpg', 'synpic46943.jpg',
'synpic33889.jpg', 'synpic23571.jpg', 'synpic23008.jpg', 'synpic28180.jpg',
'synpic42951.jpg', 'synpic33844.jpg', 'synpic54823.jpg', 'synpic38630.jpg',
'synpic26158.jpg', 'synpic22684.jpg', 'synpic22982.jpg', 'synpic22020.jpg',
'synpic58261.jpg', 'synpic45914.jpg', 'synpic34922.jpg', 'synpic28695.jpg',
'synpic43433.jpg', 'synpic21028.jpg', 'synpic29219.jpg', 'synpic24967.jpg',
'synpic24220.jpg'
]
_MORPHOLOGY_KERNEL = np.ones((5, 5), np.uint8)
_GAUSS_VALUE = 5
_CANNY_MIN = 230
_CANNY_MAX = 250
_FCM_NORM_VALUE = 0.8
adjust_tilt
lib/utils/create_resized_images.py
according to lib/utils.run.sh
again after preprocessing the images. Beware of the channel number.問題:csv 檔算出來的分數比 inference script 少一大截。
Potential Pitfall: 他的 label encode 之前做的處理比較多,我有些沒有做
Train | Val | |
---|---|---|
Answer Acc | 91.22% | 67.63% |
我用的 quality check:
# calculate accuracy
def check_eq(r: pd.Series):
"""
TODO: remove puncts
"""
a = r['answer']
p = r['predicted_answer'] if 'predicted_answer' in r else r['pred_answer']
# if any of them is nan, turn to empty string
if type(a) == float:
a = ''
if type(p) == float:
p = ''
a = a.lower()
p = p.lower()
return a==p
451 181.0 270.0 # open, close
[Validate] Acc:71.175163% | Open_ACC:56.906078% | Close_ACC:80.740738%
3064 1251.0 1813.0
[Train] Acc:98.727158% | Open_ACC:97.921661% | Close_ACC:99.282951%
使成對 image embedding 與 text embedding 越靠近越好(contrastive learning)
Clip 內的訓練:
_MODELS = {
"RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
"RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
"RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
"RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
"RN50x64": "https://openaipublic.azureedge.net/clip/models/be1cfb55d75a9666199fb2206c106743da0f6468c9d327f3e0d0a543a9919d9c/RN50x64.pt",
"ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
"ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
"ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt",
}
451 181.0 270.0 # open, close
[Validate] Val_Acc:71.175163% | Open_ACC:56.906078% | Close_ACC:80.740738%
3064 1251.0 1813.0
[Train] Val_Acc:98.727158% | Open_ACC:97.921661% | Close_ACC:99.282951%
前置作業:紀錄未經特殊處理的圖片的 VQA 正確率
使用經過特殊處理的圖片跑三個實驗
# line: 191
# diverse Attention -> (open + close)
att_close, _ = self.close_att(v_close,q_close)
att_open, _ = self.open_att(v_open,q_open)
# bilinear residual network
last_output_close = self.close_resnet(v_close,q_close,att_close)
last_output_open = self.open_resnet(v_open,q_open,att_open)
last_output_close = last_output_close * typeatt_close
last_output_open = last_output_open * typeatt_open
class typeAttention(nn.Module):
def __init__(self, size_question, path_init):
super(typeAttention, self).__init__()
self.w_emb = WordEmbedding(size_question, 300, 0.0, False)
self.w_emb.init_embedding(path_init)
self.q_emb = QuestionEmbedding(300, 1024, 1, False, 0.0, 'GRU')
self.q_final = QuestionAttention(1024)
self.f_fc1 = linear(1024, 2048)
self.f_fc2 = linear(2048, 1024)
self.f_fc3 = linear(1024, 1024)
def forward(self, question):
question = question[0]
w_emb = self.w_emb(question)
q_emb = self.q_emb.forward_all(w_emb) # [batch, q_len, q_dim]
q_final = self.q_final(w_emb, q_emb) # b, 1024
x_f = self.f_fc1(q_final)
x_f = F.relu(x_f)
x_f = self.f_fc2(x_f)
x_f = F.dropout(x_f)
x_f = F.relu(x_f)
x_f = self.f_fc3(x_f)
return x_f
QCR_PubMedCLIP/lib/BAN/multi_level_model.py
# QCR_PubMedCLIP/lib/BAN/multi_level_model.py
# Create BAN model
class BAN_Model(nn.Module):
def __init__(self, dataset, cfg, device):
super(BAN_Model, self).__init__()
self.cfg = cfg
self.dataset = dataset
self.device = device
# init word embedding module, question embedding module, biAttention network, bi_residual network, and classifier
self.w_emb = WordEmbedding(dataset.dictionary.ntoken, 300, .0, cfg.TRAIN.QUESTION.CAT)
self.q_emb = QuestionEmbedding(600 if cfg.TRAIN.QUESTION.CAT else 300, cfg.TRAIN.QUESTION.HID_DIM, 1, False, .0, cfg.TRAIN.QUESTION.RNN)
# for close att+ resnet + classify
self.close_att = BiAttention(dataset.v_dim, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.ATTENTION.GLIMPSE)
self.close_resnet = BiResNet(cfg, dataset)
self.close_classifier = SimpleClassifier(cfg.TRAIN.QUESTION.CLS_HID_DIM, cfg.TRAIN.QUESTION.CLS_HID_DIM * 2, dataset.num_close_candidates, cfg)
# for open_att + resnet + classify
self.open_att = BiAttention(dataset.v_dim, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.QUESTION.HID_DIM, cfg.TRAIN.ATTENTION.GLIMPSE)
self.open_resnet = BiResNet(cfg, dataset)
Main Contributor: @zhiao777774
QCR_PubMedCLIP/main/main.py
# line 75
glove_weights_path = os.path.join(data_dir, "glove6b_init_300d.npy")
question_classify = classify_model(d.ntoken, glove_weights_path)
if cfg.DATASET.DATASET == "SLAKE":
ckpt = './saved_models/type_classifier_slake.pth'
pretrained_model = torch.load(ckpt, map_location='cuda:0')['model_state']
else:
ckpt = './saved_models/type_classifier.pth'
qtype_ckpt = './saved_models/qtype_classifier.pth'
pretrained_model = torch.load(ckpt, map_location='cuda:0')
question_classify.load_state_dict(pretrained_model)
但他好像沒有用到 qtype_ckpt 了,這顆 pretrained ckpts 在 saved_models 內沒有。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.