Giter VIP home page Giter VIP logo

Comments (14)

naoe1999 avatar naoe1999 commented on June 14, 2024 1

@DishaDRao

Thank you very much for your advice.
I've followed your recommendation, and it finally works for me too.

I'm using 'wentaozhu/DeepLung' repository for the training & evaluation with LUNA16 dataset, and starts getting meaningful FROC results.
For the segmentation of new CT scan data (they are not provided with segmentation data unlikely to LUNA16), I also found this repository helpful.

Many thanks! :-)

from luna16.

naoe1999 avatar naoe1999 commented on June 14, 2024

@DishaDRao
I have the same concern about this.

I have fully trained with the existing code and got strange output which doesn't make sense at all.
So I doubt the loss function in the same way you mentioned.

Did you get some result on it?
I will try though.

from luna16.

DishaDRao avatar DishaDRao commented on June 14, 2024

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

from luna16.

MHansy avatar MHansy commented on June 14, 2024

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

Hello kindly help me the testing codes (How to test the training model) so as to get predicted nodules.

from luna16.

MHansy avatar MHansy commented on June 14, 2024

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

Testing codes please

from luna16.

naoe1999 avatar naoe1999 commented on June 14, 2024

@DishaDRao
Thank you for your advice.

However, though I changed the loss function, I couldn't get any meaningful result.
When I trained say 50 epochs, the output value set of 3d grid cell becomes identical to each cell.

Yes, I guess some other part have an issue. Let me point out possible one.

The shape of the output tensor is (32, 32, 32, 3, 5),
which is (# of cells in x-axis, # of cells in y-axis, # of cells in z-axis, # of anchors in each cell, # of values that is x, y, z, r, c)

And the values inside this output tensor is repeated every single cell.
For example, output[0, 0, 0, :, :] == output[l, m, n, :, :] for all l, m, and n. Identical !!
This shouldn't have happened if the model had been trained properly.

I think this is due to very heavily imbalanced positive vs. negative ratio inside the target tensor.
If you have one nodule for a certain 3d-patch, if you look inside the "target" tensor, just only one value out of 3 x 32 x 32 x 32 tensor have a positive value. This makes it 1 : 98303 (3 x 32 x 32 x 32 - 1) imbalanced classification problem!!

After a good iteration of training, it becomes predicting all values to negative.
That is my theory. Well, I'm not sure it's the only issue, but I am quite sure this would be one of the major issues at least.

To solve this, maybe multiple anchor assignment to the GT nodule, and random sampling of negative target cell would be necessary.
I'm just not sure it would be smart to keep working on this code base instead of seeking and moving on to another.

Could you tell me if you have suggestion or any other code base you recommend?

from luna16.

naoe1999 avatar naoe1999 commented on June 14, 2024

@MHansy

I didn't make test code for this model.
I just got a problem when I finished the training using this code base, which made me stop at that point.

Without solving this, test is meaningless.
The test result would be 0% in detection score (FROC, recall, all the scores), because it would predict all the input as negative!

Anyway, this is my test scheme I was going to do after it gives meaningful output :

  1. You should get patches to cover all lung volume from each validation CT scan first.

  2. Get output prediction from the trained model.

  3. Store all the positive predictions as .csv file (same format as the LUNA16's sampleSubmission.csv)
    NMS (non-maximum suppression) would be necessary for this step

  4. Use noduleCADEvaluationLUNA16.py file to get the test score (FROC and so on).

You can download sampleSubmission.csv and noduleCADEvaluationLUNA16.py from the LUNA16's official site.

from luna16.

DishaDRao avatar DishaDRao commented on June 14, 2024

@naoe1999 @MHansy

The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.

I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.

If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)

In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!

(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

from luna16.

SirMwan avatar SirMwan commented on June 14, 2024

@naoe1999 @MHansy

The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.

I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.

If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)

In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!

(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

Hello @DishaDRao and @naoe1999

Kindly help please.

I tried to make follow up on your conversion and advises, and I went through wentaozhu/DeepLung repository and unfortunately at the LOSS CODES I find the same thing at the labels(+1).

BUT during training with that codes, I found that the loss does not decreasing, I am not sure if I have to remove (+1) in labels in the codes.

from luna16.

SirMwan avatar SirMwan commented on June 14, 2024

@DishaDRao did you refer at the point below? This is from data.py file from wentaozhu repository.

class LabelMapping(object):
def init(self, config, phase):
self.stride = np.array(config['stride'])
self.num_neg = int(config['num_neg'])
self.th_neg = config['th_neg']
self.anchors = np.asarray(config['anchors'])
self.phase = phase
if phase == 'train':
self.th_pos = config['th_pos_train']
elif phase == 'val':
self.th_pos = config['th_pos_val']

def __call__(self, input_size, target, bboxes, filename):
    stride = self.stride
    num_neg = self.num_neg
    th_neg = self.th_neg
    anchors = self.anchors
    th_pos = self.th_pos
    
    output_size = []
    for i in range(3):
        if input_size[i] % stride != 0:
            print(filename)
        # assert(input_size[i] % stride == 0) 
        output_size.append(int(input_size[i] / stride))  #Nimetoa int
    
    label = -1 * np.ones(output_size + [len(anchors), 5], np.float32)     #badili from np.float32
    offset = ((stride.astype('float')) - 1) / 2
    oz = np.arange(offset, offset + stride * (output_size[0] - 1) + 1, stride)
    oh = np.arange(offset, offset + stride * (output_size[1] - 1) + 1, stride)
    ow = np.arange(offset, offset + stride * (output_size[2] - 1) + 1, stride)

    for bbox in bboxes:
        for i, anchor in enumerate(anchors):
            iz, ih, iw = select_samples(bbox, anchor, th_neg, oz, oh, ow)
            label[iz, ih, iw, i, 0] = 0

    if self.phase == 'train' and self.num_neg > 0:
        neg_z, neg_h, neg_w, neg_a = np.where(label[:, :, :, :, 0] == -1)
        neg_idcs = random.sample(range(len(neg_z)), min(num_neg, len(neg_z)))
        neg_z, neg_h, neg_w, neg_a = neg_z[neg_idcs], neg_h[neg_idcs], neg_w[neg_idcs], neg_a[neg_idcs]
        label[:, :, :, :, 0] = 0
        label[neg_z, neg_h, neg_w, neg_a, 0] = -1

    if np.isnan(target[0]):
        return label
    iz, ih, iw, ia = [], [], [], []
    for i, anchor in enumerate(anchors):
        iiz, iih, iiw = select_samples(target, anchor, th_pos, oz, oh, ow)
        iz.append(iiz)
        ih.append(iih)
        iw.append(iiw)
        ia.append(i * np.ones((len(iiz),), np.int64))
    iz = np.concatenate(iz, 0)
    ih = np.concatenate(ih, 0)
    iw = np.concatenate(iw, 0)
    ia = np.concatenate(ia, 0)
    flag = True 
    if len(iz) == 0:
        pos = []
        for i in range(3):
            pos.append(max(0, int(np.round((target[i] - offset) / stride))))
        idx = np.argmin(np.abs(np.log(target[3] / anchors)))
        pos.append(idx)
        flag = False
    else:
        idx = random.sample(range(len(iz)), 1)[0]
        pos = [iz[idx], ih[idx], iw[idx], ia[idx]]
    dz = (target[0] - oz[pos[0]]) / anchors[pos[3]]
    dh = (target[1] - oh[pos[1]]) / anchors[pos[3]]
    dw = (target[2] - ow[pos[2]]) / anchors[pos[3]]
    dd = np.log(target[3] / anchors[pos[3]])
    label[pos[0], pos[1], pos[2], pos[3], :] = [1, dz, dh, dw, dd]
    return label        

from luna16.

DishaDRao avatar DishaDRao commented on June 14, 2024

@naoe1999 @MHansy
The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.
I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.
If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)
In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!
(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

Hello @DishaDRao and @naoe1999

Kindly help please.

I tried to make follow up on your conversion and advises, and I went through wentaozhu/DeepLung repository and unfortunately at the LOSS CODES I find the same thing at the labels(+1).

BUT during training with that codes, I found that the loss does not decreasing, I am not sure if I have to remove (+1) in labels in the codes.

Hi,

If you're following wentaozhu/DeepLung respository, you need not change anything in the loss function nor in data.py function. The negative samples are labelled in a correct manner. As mentioned in my previous comment, the +1 in the loss function is to make the nagative labels to 0. So, it's for a purpose!
Whereas in this repo (mostafa/Luna16) that +1 would be a mistake as the negative samples are not lablelled in a manner how data.py function does in the other repo!

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

from luna16.

SirMwan avatar SirMwan commented on June 14, 2024

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

@DishaDRao I am training through google collab, what I have done is to reduce batch size, also what else I have done is I am not using Dataparalle in training because I use single gpu.

Furthermore, chenges in the pytorch version must have some issues like int issues need to put in some areas. Ihave done it for almost two months now Iam getting crayz. I started the process again and again but no success.

If u dont mind, share with me your data.py, main.py and layers.py files.
my email is [email protected]
Thanks in advance.

from luna16.

SirMwan avatar SirMwan commented on June 14, 2024

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

@DishaDRao I am training through google collab, what I have done is to reduce batch size, also what else I have done is I am not using Dataparalle in training because I use single gpu.

Furthermore, chenges in the pytorch version must have some issues like int issues need to put in some areas. Ihave done it for almost two months now Iam getting crayz. I started the process again and again but no success.

If u dont mind, share with me your data.py, main.py and layers.py files.
my email is [email protected]
Thanks in advance.

This is the change in training I have done

def train(data_loader, net, loss, epoch, optimizer, get_lr, save_freq, save_dir):
start_time = time.time()

net.train()
lr = get_lr(epoch)
for param_group in optimizer.param_groups:
    param_group['lr'] = lr

metrics = []

for i, (data, target, coord) in enumerate(data_loader):
    if torch.cuda.is_available():
        data = Variable(data.cuda())
        target = Variable(target.cuda())
        coord = Variable(coord.cuda())
    data = data.float()
    target = target.float()
    coord = coord.float()


    optimizer.zero_grad()
    output = net(data, coord)
    loss_output = loss(output, target)
    loss_output[0].backward()
    optimizer.step()

    loss_output[0] = loss_output[0].item()    ####changes this part
    metrics.append(loss_output)

if epoch % args.save_freq == 0:            
    state_dict = net.state_dict()
    for key in state_dict.keys():
        state_dict[key] = state_dict[key].cpu()
        
    torch.save({
        'epoch': epoch,
        'save_dir': save_dir,
        'state_dict': state_dict,
        'args': args},
        os.path.join(save_dir, '%03d.ckpt' % epoch))

end_time = time.time()
metrics = np.asarray(metrics, np.float32)
print('Epoch %03d (lr %.5f)' % (epoch, lr))
print('Train:      tpr %3.2f, tnr %3.2f, total pos %d, total neg %d, time %3.2f' % (
    100.0 * np.sum(metrics[:, 6]) / np.sum(metrics[:, 7]),
    100.0 * np.sum(metrics[:, 8]) / np.sum(metrics[:, 9]),
    np.sum(metrics[:, 7]),
    np.sum(metrics[:, 9]),
    end_time - start_time))
print('loss %2.4f, classify loss %2.4f, regress loss %2.4f, %2.4f, %2.4f, %2.4f' % (
    np.mean(metrics[:, 0]),
    np.mean(metrics[:, 1]),
    np.mean(metrics[:, 2]),
    np.mean(metrics[:, 3]),
    np.mean(metrics[:, 4]),
    np.mean(metrics[:, 5])))
print()

from luna16.

SirMwan avatar SirMwan commented on June 14, 2024

In the data file also,

...
else:
imgs = np.load(self.filenames[idx])
bboxes = self.sample_bboxes[idx]
nz, nh, nw = imgs.shape[1:]
pz = int(np.ceil(float(nz) / self.stride)) * self.stride
ph = int(np.ceil(float(nh) / self.stride)) * self.stride
pw = int(np.ceil(float(nw) / self.stride)) * self.stride
imgs = np.pad(imgs, [[0,0],[0, pz - nz], [0, ph - nh], [0, pw - nw]], 'constant',constant_values = self.pad_value)

        xx,yy,zz = np.meshgrid(np.linspace(-0.5,0.5,int(imgs.shape[1]/self.stride)),   ##added int
                               np.linspace(-0.5,0.5,int(imgs.shape[2]/self.stride)),                 ##added int
                               np.linspace(-0.5,0.5,int(imgs.shape[3]/self.stride)),indexing ='ij')      ###added int
        coord = np.concatenate([xx[np.newaxis,...], yy[np.newaxis,...],zz[np.newaxis,:]],0).astype('float32')
        imgs, nzhw = self.split_comber.split(imgs)
        coord2, nzhw2 = self.split_comber.split(coord,
                                               side_len = int(self.split_comber.side_len/self.stride),
                                               max_stride = int(self.split_comber.max_stride/self.stride),
                                               margin = int(self.split_comber.margin/self.stride))
        assert np.all(nzhw==nzhw2)
        imgs = (imgs.astype(np.float32)-128)/128
        return torch.from_numpy(imgs), bboxes, torch.from_numpy(coord2), np.array(nzhw)

from luna16.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.