Giter VIP home page Giter VIP logo

pytorch-computer-vision-cookbook's Issues

Chapter 5 method xyxyh2xywh

In the implementation :

  def xyxyh2xywh(xyxy, image_size=416):
      xywh = torch.zeros(xyxy.shape[0],6)
      xywh[:,2] = (xyxy[:, 0] + xyxy[:, 2]) / 2./img_size
      xywh[:,3] = (xyxy[:, 1] + xyxy[:, 3]) / 2./img_size
      xywh[:,5] = (xyxy[:, 2] - xyxy[:, 0])/img_size 
      xywh[:,4] = (xyxy[:, 3] - xyxy[:, 1])/img_size
      xywh[:,1]= xyxy[:,6]    
      return xywh

i think that it should be

  def xyxyh2xywh(xyxy, image_size=416):
      xywh = torch.zeros(xyxy.shape[0],6)
      xywh[:,2] = (xyxy[:, 0] + xyxy[:, 2]) / 2./img_size
      xywh[:,3] = (xyxy[:, 1] + xyxy[:, 3]) / 2./img_size
      xywh[:,4] = (xyxy[:, 2] - xyxy[:, 0])/img_size 
      xywh[:,5] = (xyxy[:, 3] - xyxy[:, 1])/img_size
      xywh[:,1]= xyxy[:,6]    
      return xywh

Chapter 5 get_yolo_targets

Hello,

I don't understand how the code snippet below works. Can anyone explain it to me?
In 1), the shape of "obj_mask" is [8, 3, 13, 13] because I'm using 8 as batch size.
In 2), the shape of "batch_inds" and "best_anchor_ind" is 90 because the program found 90 bounding boxes in the batch.
Now, the original shape is [8, 3, 13, 13] and the shape to be calculate is [90, 90, 13, 13].
Is this right? It seems that the program works without finding errors, but I'm wondering if it is right code.

sizeT=batch_size, num_anchors, grid_size, grid_size
obj_mask = torch.zeros(sizeT,device=device,dtype=torch.uint8)             # 1)
noobj_mask = torch.ones(sizeT,device=device,dtype=torch.uint8)
tx = torch.zeros(sizeT, device=device, dtype=torch.float32)
ty= torch.zeros(sizeT, device=device, dtype=torch.float32)
tw= torch.zeros(sizeT, device=device, dtype=torch.float32)
th= torch.zeros(sizeT, device=device, dtype=torch.float32)

sizeT=batch_size, num_anchors, grid_size, grid_size, num_cls
tcls= torch.zeros(sizeT, device=device, dtype=torch.float32)

target_bboxes = target[:, 2:] * grid_size
t_xy = target_bboxes[:, :2]
t_wh = target_bboxes[:, 2:]
t_x, t_y = t_xy.t()
t_w, t_h = t_wh.t()

grid_i, grid_j = t_xy.long().t()

iou_with_anchors=[get_iou_WH(anchor, t_wh) for anchor in anchors]
iou_with_anchors = torch.stack(iou_with_anchors)
best_iou_wa, best_anchor_ind = iou_with_anchors.max(0)

batch_inds, target_labels = target[:, :2].long().t()
obj_mask[batch_inds, best_anchor_ind, grid_j, grid_i] = 1                 # 2)
noobj_mask[batch_inds, best_anchor_ind, grid_j, grid_i] = 0

Chapter 6 - CUDA goes out of memory even with batch size 4.

I am not sure why but even with batch size 4 CUDA goes out of memory, i think many unwanted things(like gradients)are kept in memory so memory is insufficient. I have 6 GB memory, can you alter the code to clear unwanted things out of memory and make the code even more efficient.

Even i tried in kaggle kernel and again CUDA goes out of memory.

Chapter5 error

Hello,

I got some errors at Chapter5
image

Can you tell me how to solve this issue?

Thank you.

Training yolo model with different image size

Hi,
I'm using the code from Chapter 5 as a guide to train a yolo model, but I'm struggling with trying to use an image of different dimensions than the 416 that is used in the book. I've edited the create_layers function so that it takes image size as an input and passes it to YOLOLayer, but when I try to train a model, I get the error:

File "", line 97, in forward
x = torch.cat([layer_outputs[int(l_i)]
RuntimeError: Sizes of tensors must match except in dimension 1. Got 34 and 33 in dimension 2

Do you have any suggestions for using a different size image? I could not find any details about this in the text book either. Thank you

chapter2 I have a KeyError: '_labels'

Traceback (most recent call last):
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '_labels'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/jewoo/PycharmProjects/vision/chapter/customDataset.py", line 37, in
histo_dataset = histoCancerDataset(data_dir, data_transformer, "train")
File "C:/Users/jewoo/PycharmProjects/vision/chapter/customDataset.py", line 23, in init
self.labels = [labels_df.loc[filename[:-4]].values[0] for filename in filenames]
File "C:/Users/jewoo/PycharmProjects/vision/chapter/customDataset.py", line 23, in
self.labels = [labels_df.loc[filename[:-4]].values[0] for filename in filenames]
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\indexing.py", line 1768, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\indexing.py", line 1965, in _getitem_axis
return self._get_label(key, axis=axis)
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\indexing.py", line 625, in _get_label
return self.obj._xs(label, axis=axis)
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\generic.py", line 3537, in xs
loc = self.index.get_loc(key)
File "C:\Users\jewoo\anaconda3\envs\totti\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '_labels'

NotaDirectoryError

I downloaded the HMDB dataset and extracted the main folder, but i was getting an error (NotaDirectoryError) in chapter10.py when i was trying to establish path2ajpgs and im sure the path is correct

Chapter6-deployment Error

In The Code, "path2train" has been used Instead of using "path2test".
So by changing these names, The Error will be fixed.
This Error was actually in the book itself and since I am studying with this book and code every single project and although make these codes look more advanced.

Chapter 10

It seems there is a mistake in the code for chapter 10. Using the original dataset and code as in the book does not work. The model seems not to learn and is stuck at 1.98% accuracy.

lacking data

I don't see any datafiles. please, can you upload those too? thank you

Chapter 10 Resnt18RNN model accuracy is 0.

Amazing working thank you for sharing the code.
the accuracy of the model is 0 or 0.05 on validation set.
what's the issues. i just copy your code.

Thank you for your time, looking forward for reply.

Non max suppression run time issue (Chapter 05)

The Nonmax suppression code did not run, i had to change the line
detections[0, :4] = (ww * detections[supp_inds, :4]).sum(0) / ww.sum()
into
detections[0, :4] = (ww.view(-1, 1) * detections[supp_inds, :4]).sum(0) / ww.sum()

Error when running the module below

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=2, test_size=0.5, random_state=42)
train_indx, test_indx = next(sss.split(unique_ids, unique_labels))

train_ids = [unique_ids[ind] for ind in train_indx]
train_labels = [unique_labels[ind] for ind in train_indx]
print(len(train_ids), len(train_labels))

test_ids = [unique_ids[ind] for ind in test_indx]
test_labels = [unique_labels[ind] for ind in test_indx]
print(len(test_ids), len(test_labels))

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Chapter 5 training code question/error ?

The output of the Yolo layers use transform_outputs method to convert the box predictions into pixels.
When training we need to revert this back to the original network ouput of the yolo layer. To do this the method transform_bbox is used. It is indeed reverting the operations of transform_outputs, except the last thing done in transform_outputs where the boxes are scaled by the stride factor (32, 16 or 8 depending of the layer). When calculating the loss , we compare directly the calculated targets (via get_yolo_targets) and the 'reverted' yolo network outputs. To me this looks wrong as the reverted yolo outputs still have the stride factor in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.