Giter VIP home page Giter VIP logo

Comments (8)

stefanopini avatar stefanopini commented on May 18, 2024 1

Yes, but images are too much to be stored in RAM (in ordinary machines) so you have to load them from disk and, since the samples are shuffled during training, you have to re-load the same image at different steps of each epoch.

from simple-hrnet.

murdockhou avatar murdockhou commented on May 18, 2024 1

@stefanopini Bother again. I find when creating dataset, you used get_affine_transform and cv2.warpAffine func to get correct single person area in dataset/COCO.py/line 293. I'm a little confused why don't you use crop function directly croped person area on ori_img? Is that has much difference between these two ways?

from simple-hrnet.

murdockhou avatar murdockhou commented on May 18, 2024 1

from simple-hrnet.

stefanopini avatar stefanopini commented on May 18, 2024

Hi!

Maybe I do not get the point of your question.
HRNet is a single-person HPE method, therefore input images should contain only one person (which is what happens both during training and testing).
Ms COCO provides both the keypoint annotations and the person bounding boxes, so it is possible to create a different example (with a different image crop) for each person using keypoint and bounding box annotations.
Each annotation is added to a list inthese lines:

self.data = []
# load annotations for each image of COCO
for imgId in tqdm(self.imgIds):
ann_ids = self.coco.getAnnIds(imgIds=imgId, iscrowd=False)
img = self.coco.loadImgs(imgId)[0]
if self.use_gt_bboxes:
objs = self.coco.loadAnns(ann_ids)
# sanitize bboxes
valid_objs = []
for obj in objs:
# Skip non-person objects (it should never happen)
if obj['category_id'] != 1:
continue
# ignore objs without keypoints annotation
if max(obj['keypoints']) == 0:
continue
x, y, w, h = obj['bbox']
x1 = np.max((0, x))
y1 = np.max((0, y))
x2 = np.min((img['width'] - 1, x1 + np.max((0, w - 1))))
y2 = np.min((img['height'] - 1, y1 + np.max((0, h - 1))))
# Use only valid bounding boxes
if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1]
valid_objs.append(obj)
objs = valid_objs
else:
objs = bboxes[imgId]
# for each annotation of this image, add the formatted annotation to self.data
for obj in objs:
joints = np.zeros((self.nof_joints, 2), dtype=np.float)
joints_visibility = np.ones((self.nof_joints, 2), dtype=np.float)
if self.use_gt_bboxes:
# COCO pre-processing
# # Moved above
# # Skip non-person objects (it should never happen)
# if obj['category_id'] != 1:
# continue
#
# # ignore objs without keypoints annotation
# if max(obj['keypoints']) == 0:
# continue
for pt in range(self.nof_joints):
joints[pt, 0] = obj['keypoints'][pt * 3 + 0]
joints[pt, 1] = obj['keypoints'][pt * 3 + 1]
t_vis = int(np.clip(obj['keypoints'][pt * 3 + 2], 0, 1)) # ToDo check correctness
# COCO:
# if visibility == 0 -> keypoint is not in the image.
# if visibility == 1 -> keypoint is in the image BUT not visible (e.g. behind an object).
# if visibility == 2 -> keypoint looks clearly (i.e. it is not hidden).
joints_visibility[pt, 0] = t_vis
joints_visibility[pt, 1] = t_vis
center, scale = self._box2cs(obj['clean_bbox'][:4])
self.data.append({
'imgId': imgId,
'annId': obj['id'],
'imgPath': os.path.join(self.root_path, self.data_version, '%012d.jpg' % imgId),
'center': center,
'scale': scale,
'joints': joints,
'joints_visibility': joints_visibility,
})

Then each image is cropped (to extract a specific person) and rescaled with an affine warping in:
trans = get_affine_transform(c, s, self.pixel_std, r, self.image_size)
image = cv2.warpAffine(
image,
trans,
(int(self.image_size[0]), int(self.image_size[1])),
flags=cv2.INTER_LINEAR
)

Does this answer to your question?

from simple-hrnet.

murdockhou avatar murdockhou commented on May 18, 2024

from simple-hrnet.

stefanopini avatar stefanopini commented on May 18, 2024

Hi @murdockhou !

The difference is that using warpAffine you can apply affine transformations instead of just cropping the person area.
This is not useful during evaluation/testing, but it is used during training for data augmentation.
If you look at the previous lines of the file (L258-L296), you can see that the parameters passed to the function get_affine_transform simply crop the image if self.is_train is False while their values are modified to change the scale and to rotate and flip the person area for data augmentation if self.is_train is True.
I hope it is clearer now.

Btw, I've adapted this code from the original implementation and some details are still unclear to me.
In particular, I don't know the meaning of the parameter pixel_std (see line 109).

from simple-hrnet.

valentin-fngr avatar valentin-fngr commented on May 18, 2024

Thank you both of you for clarifying my understanding.
One question : @stefanopini has mentioned that we can only detect a single person per image.
When you say 'image' do you mean the cropped and rescaled bounding box area on the image ?

from simple-hrnet.

stefanopini avatar stefanopini commented on May 18, 2024

Hi @valentin-fngr , that's correct.
The HRNet model is designed as a top-down approach: person detection first (with almost any detector), then human pose estimation on the single person bounding box area with HRNet.
On the opposite, HigherHRNet is a bottom-up multiperson approach.

from simple-hrnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.