Hi, is there a way to get validation loss during training? I want to monitor it for ov

Hi there, I updated the to call compute_validation_loss at the validation epoch

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Compute Validation Loss,about dbolya/yolact

Comments (14)

danamyu commented on July 23, 2024 2

Hi there, I updated the script to call compute_validation_loss at the validation epoch check after compute_validation_map, but I keep getting an attribute error (see below) because pytorch is getting a list instead of a tensor.

According to Issue 243, "prepare_data" should help with this list to tensor conversion, but it is being called in the line right before "out=net(images)" Do you have any suggestions for what to try next? Thanks for your help!

In train.py:

        # This is done per epoch
        if args.validation_epoch > 0:
            if epoch % args.validation_epoch == 0 and epoch > 0:
                compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None)
                compute_validation_loss(yolact_net,val_data_loader,MultiBoxLoss, log if args.log else None)

The only other code i've changed is adding logging to compute_validation_loss (the script hasnt gotten that far yet):

def compute_validation_loss(net, data_loader, criterion,log:Log=None):
global loss_types
net = CustomDataParallel(NetLoss(net, criterion))
with torch.no_grad():
losses = {}

    # Don't switch to eval mode because we want to get losses
    iterations = 0
    for datum in data_loader:
        images, targets, masks, num_crowds = prepare_data(datum)
        out = net(images)

        wrapper = ScatterWrapper(targets, masks, num_crowds)
        _losses = criterion(out, wrapper, wrapper.make_mask())
        for k, v in _losses.items():
            v = v.mean().item()
            if k in losses:
                losses[k] += v
            else:
                losses[k] = v

        if log is not None:
            precision = 5
            loss_info = {k: round(losses[k].item(), precision) for k in losses}
            loss_info['T'] = round(loss.item(), precision)

            log.log('val', loss=loss_info, epoch=epoch, iter=iteration,
                    lr=round(cur_lr, 10), elapsed=elapsed)

        iterations += 1
        if args.validation_size <= iterations * args.batch_size:
            break
    
    for k in losses:
        losses[k] /= iterations
        
    
    loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], [])
    print(('Validation ||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True)

Error:

<class 'list'>
Traceback (most recent call last):
File "train.py", line 523, in
train()
File "train.py", line 377, in train
compute_validation_loss(yolact_net,val_data_loader,MultiBoxLoss, log if args.log else None)
File "train.py", line 480, in compute_validation_loss
out = net(images)
File "C:\ProgramData\Anaconda3\envs\yolact-env-py37\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "C:\Users\dany\ml\yolact\yolact.py", line 568, in forward
_, _, img_h, img_w = x.size()
AttributeError: 'list' object has no attribute 'size'

from yolact.

dbolya commented on July 23, 2024 1

I removed that function because I hadn't updated it in a long while. I'll add it back and update it though (check my latest commit).

To use, create a data_loader for val_dataset in the same way I create one for dataset and then call the function with the proper arguments.

Fair warning though, the evaluation will take a while (you might want to reduce the number of validation examples being evaluated, check the arguments to train.py).

from yolact.

konrad-ivelic commented on July 23, 2024 1

I don't know if it is still needed, but I have found that this works for me:

Creating a data loader for the validation set

val_data_loader = data.DataLoader(val_dataset, args.batch_size, 
                                  num_workers=args.num_workers, 
                                  shuffle=True, collate_fn=detection_collate, 
                                  pin_memory=True)

Creating a dictionary for the validation loss averages

    val_loss_avgs  = { k: MovingAverage(100) for k in loss_types }

Adding this to calculate loss on validation the same way that it does for the train loss. Adding this after the for loop that calculates the train and before the calculation of validation mAP.

            # This is done per epoch
            if epoch>0:
                #Calculates the loss on the validation dataset.
                print('Calculating validaton losses, this may take a while...')
                
                val_iteration = 0
                
                for dat in val_data_loader:

                    #if len(dat[0]) < args.batch_size:

                    # Zero the grad to get ready to compute gradients
                    optimizer.zero_grad()

                    # Forward Pass + Compute loss at the same time (see CustomDataParallel and NetLoss)
                    losses = net(dat)
                    
                    losses = { k: (v).mean() for k,v in losses.items() } # Mean here because Dataparallel
                    loss = sum([losses[k] for k in losses])
                    
                    # no_inf_mean removes some components from the loss, so make sure to backward through all of it
                    # all_loss = sum([v.mean() for v in losses.values()])

                    # Backprop
                    loss.backward() # Do this to free up vram even if loss is not finite
                    if torch.isfinite(loss).item():
                        optimizer.step()
                    
                    # Add the loss to the moving average for bookkeeping
                    for k in losses:
                        val_loss_avgs[k].add(losses[k].item())
                
                    val_iteration += 1
                    if args.validation_size <= val_iteration * args.batch_size:
                        break      

                total = sum([val_loss_avgs[k].get_avg() for k in losses])
                loss_labels = sum([[k, val_loss_avgs[k].get_avg()] for k in loss_types if k in losses], [])
                    
                print(('Validation Loss ||' + (' %s: %.3f |' * len(losses)) + ' T: %.3f '+')') % tuple(loss_labels+[total]), flush=True)

                if args.log:
                    precision = 5
                    loss_info = {k: round(losses[k].item(), precision) for k in losses}
                    loss_info['T'] = round(loss.item(), precision)
                    log.log('val', loss=loss_info, epoch=epoch, iter=iteration)

Also faced an index out of range issue that was fixed with this change on the loop of the function "prepare_data". This fixes it without breaking anything.

        for device, alloc in zip(devices, allocation):
            for _ in range(min(alloc,len(datum[0]))):
                images[cur_idx]  = gradinator(images[cur_idx].to(device))
                targets[cur_idx] = gradinator(targets[cur_idx].to(device))
                masks[cur_idx]   = gradinator(masks[cur_idx].to(device))
                cur_idx += 1

Using this I managed to get validation loss calculated and added to the log 😃

from yolact.

ridasalam commented on July 23, 2024

Thanks!

from yolact.

LukasMahieuArinti commented on July 23, 2024

Doesn't seem to be fixed for me. For those interested, here is what I think the code needs to be:

First create a new instance of the dataloader for the validation dataset:

data_loader_val = data.DataLoader(val_dataset, len(val_dataset), # WARNING: using the full length of val_dataset might cause a memory overflow...
                            num_workers=args.num_workers,
                            shuffle=True, collate_fn=detection_collate,
                            generator=torch.Generator(device='cuda'), 
                            pin_memory=True)

And then change your compute_validation_loss to this:

def compute_validation_loss(net, data_loader):
    #Calculates the loss on the validation dataset.
    print('Calculating validaton losses, this may take a while...')

    global loss_types

    with torch.no_grad():
        losses = {}
        
        # Don't switch to eval mode here. Warning: this is viable but changes the interpretation of the validation loss.
        for datum in data_loader:
            losses = net.forward(datum)
            
            losses = { k: (v).mean() for k,v in losses.items() }
            loss = sum([losses[k] for k in losses]) 
        
        loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], [])
        print(('Validation Loss||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True)```

from yolact.

ChangShanCheng commented on July 23, 2024

@LukasMahieuArinti
hi, I tried the solution you suggested, but still got the same type of error. Is there any other solution please?

Error:

Traceback (most recent call last):
File "train.py", line 530, in
train()
File "train.py", line 378, in train
compute_validation_loss(yolact_net, data_loader_val)
File "train.py", line 503, in compute_validation_loss
losses = net.forward(datum)
File "/home/graduate/shancheng/yolact/yolact.py", line 566, in forward
_, _, img_h, img_w = x.size()
AttributeError: 'list' object has no attribute 'size'

from yolact.

LukasMahieuArinti commented on July 23, 2024

Weird, works fine for me. Are you sure you are passing the correct arguments to the validation loss function?
The 'net' argument should be the Yolact model as defined in this piece of code:

yolact_net = Yolact()
net = yolact_net

Also, makes sure you only call this function once per epoch and you already went through one epoch at least. It doesn't make a lot of sense to call it more/earlier.

If you want to look at some code: I recently noticed that the yolactedge repository has a very similar implementation to compute the validation loss.

from yolact.

bhuvanofc commented on July 23, 2024

@ChangShanCheng did you get a solution of 'AttributeError: 'list' object has no attribute 'size'' i am getting the same wrror when trying to calculate validation loss

from yolact.

ChangShanCheng commented on July 23, 2024

@bhuvanofc Not yet, and I also confirm that the "net" is the Yolact model, so I'm still trying to figure out a way

from yolact.

bhuvanofc commented on July 23, 2024

@ChangShanCheng thank you. please let me know if you find any solution. Also, do you happen to know how I could calculate training accuracy for yolact++ during the training process?

from yolact.

bhuvanofc commented on July 23, 2024

@dbolya any solution for this error. I need validation loss to write in my thesis

from yolact.

peter-zhang-1020 commented on July 23, 2024

I don't know if it is still needed, but I have found that this works for me:

Creating a data loader for the validation set

val_data_loader = data.DataLoader(val_dataset, args.batch_size, 
                                  num_workers=args.num_workers, 
                                  shuffle=True, collate_fn=detection_collate, 
                                  pin_memory=True)

Creating a dictionary for the validation loss averages

    val_loss_avgs  = { k: MovingAverage(100) for k in loss_types }

Adding this to calculate loss on validation the same way that it does for the train loss. Adding this after the for loop that calculates the train and before the calculation of validation mAP.

            # This is done per epoch
            if epoch>0:
                #Calculates the loss on the validation dataset.
                print('Calculating validaton losses, this may take a while...')
                
                val_iteration = 0
                
                for dat in val_data_loader:

                    #if len(dat[0]) < args.batch_size:

                    # Zero the grad to get ready to compute gradients
                    optimizer.zero_grad()

                    # Forward Pass + Compute loss at the same time (see CustomDataParallel and NetLoss)
                    losses = net(dat)
                    
                    losses = { k: (v).mean() for k,v in losses.items() } # Mean here because Dataparallel
                    loss = sum([losses[k] for k in losses])
                    
                    # no_inf_mean removes some components from the loss, so make sure to backward through all of it
                    # all_loss = sum([v.mean() for v in losses.values()])

                    # Backprop
                    loss.backward() # Do this to free up vram even if loss is not finite
                    if torch.isfinite(loss).item():
                        optimizer.step()
                    
                    # Add the loss to the moving average for bookkeeping
                    for k in losses:
                        val_loss_avgs[k].add(losses[k].item())
                
                    val_iteration += 1
                    if args.validation_size <= val_iteration * args.batch_size:
                        break      

                total = sum([val_loss_avgs[k].get_avg() for k in losses])
                loss_labels = sum([[k, val_loss_avgs[k].get_avg()] for k in loss_types if k in losses], [])
                    
                print(('Validation Loss ||' + (' %s: %.3f |' * len(losses)) + ' T: %.3f '+')') % tuple(loss_labels+[total]), flush=True)

                if args.log:
                    precision = 5
                    loss_info = {k: round(losses[k].item(), precision) for k in losses}
                    loss_info['T'] = round(loss.item(), precision)
                    log.log('val', loss=loss_info, epoch=epoch, iter=iteration)

Also faced an index out of range issue that was fixed with this change on the loop of the function "prepare_data". This fixes it without breaking anything.

        for device, alloc in zip(devices, allocation):
            for _ in range(min(alloc,len(datum[0]))):
                images[cur_idx]  = gradinator(images[cur_idx].to(device))
                targets[cur_idx] = gradinator(targets[cur_idx].to(device))
                masks[cur_idx]   = gradinator(masks[cur_idx].to(device))
                cur_idx += 1

Using this I managed to get validation loss calculated and added to the log 😃

I have the last piece of code, but there is still an index out of range error, do you know why? Looking forward to your reply

from yolact.

peter-zhang-1020 commented on July 23, 2024

Hi there, I updated the script to call compute_validation_loss at the validation epoch check after compute_validation_map, but I keep getting an attribute error (see below) because pytorch is getting a list instead of a tensor.

According to Issue 243, "prepare_data" should help with this list to tensor conversion, but it is being called in the line right before "out=net(images)" Do you have any suggestions for what to try next? Thanks for your help!

In train.py:
        # This is done per epoch
        if args.validation_epoch > 0:
            if epoch % args.validation_epoch == 0 and epoch > 0:
                compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None)
                compute_validation_loss(yolact_net,val_data_loader,MultiBoxLoss, log if args.log else None)
The only other code i've changed is adding logging to compute_validation_loss (the script hasnt gotten that far yet):

def compute_validation_loss(net, data_loader, criterion,log:Log=None): global loss_types net = CustomDataParallel(NetLoss(net, criterion)) with torch.no_grad(): losses = {}
    # Don't switch to eval mode because we want to get losses
    iterations = 0
    for datum in data_loader:
        images, targets, masks, num_crowds = prepare_data(datum)
        out = net(images)

        wrapper = ScatterWrapper(targets, masks, num_crowds)
        _losses = criterion(out, wrapper, wrapper.make_mask())
        for k, v in _losses.items():
            v = v.mean().item()
            if k in losses:
                losses[k] += v
            else:
                losses[k] = v

        if log is not None:
            precision = 5
            loss_info = {k: round(losses[k].item(), precision) for k in losses}
            loss_info['T'] = round(loss.item(), precision)

            log.log('val', loss=loss_info, epoch=epoch, iter=iteration,
                    lr=round(cur_lr, 10), elapsed=elapsed)

        iterations += 1
        if args.validation_size <= iterations * args.batch_size:
            break
    
    for k in losses:
        losses[k] /= iterations
        
    
    loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], [])
    print(('Validation ||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True)
Error:

<class 'list'> Traceback (most recent call last): File "train.py", line 523, in train() File "train.py", line 377, in train compute_validation_loss(yolact_net,val_data_loader,MultiBoxLoss, log if args.log else None) File "train.py", line 480, in compute_validation_loss out = net(images) File "C:\ProgramData\Anaconda3\envs\yolact-env-py37\lib\site-packages\torch\nn\modules\module.py", line 532, in call result = self.forward(*input, **kwargs) File "C:\Users\dany\ml\yolact\yolact.py", line 568, in forward _, _, img_h, img_w = x.size() AttributeError: 'list' object has no attribute 'size'

Have you solved the problem？

from yolact.

martoliod commented on July 23, 2024

@peter-zhang-1020 did u solve it yet?
any help would be appreciated :)

Edit: I made it work by using @konrad-ivelic s solution, but setting Shuffle=False in the val_data_loader.

Furthermore i didnt use this part:

for device, alloc in zip(devices, allocation):
for _ in range(min(alloc,len(datum[0]))):
images[cur_idx] = gradinator(images[cur_idx].to(device))
targets[cur_idx] = gradinator(targets[cur_idx].to(device))
masks[cur_idx] = gradinator(masks[cur_idx].to(device))
cur_idx += 1

and it only works using ResNet101 and not ResNet50 as backbone.

from yolact.

Compute Validation Loss about yolact HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent