Hi, bother again! I'm a little confused about training data code <a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About the training loop about simple-hrnet HOT 8 CLOSED

stefanopini commented on May 18, 2024

About the training loop

from simple-hrnet.

Comments (8)

stefanopini commented on May 18, 2024 1

Yes, but images are too much to be stored in RAM (in ordinary machines) so you have to load them from disk and, since the samples are shuffled during training, you have to re-load the same image at different steps of each epoch.

from simple-hrnet.

murdockhou commented on May 18, 2024 1

@stefanopini Bother again. I find when creating dataset, you used get_affine_transform and cv2.warpAffine func to get correct single person area in dataset/COCO.py/line 293. I'm a little confused why don't you use crop function directly croped person area on ori_img? Is that has much difference between these two ways?

from simple-hrnet.

murdockhou commented on May 18, 2024 1

Hi, Thanks for your reply and sorry for the late reply of mine. I get your idea and it is very clear for me. The paramete *pixel_std*, I also looking for its meaning, and looks like it just is a scale factor in code. If we change the number *pixel_std* in [here]( https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L109), nothing changed after doing this just for creating dataset. But maybe has influence on calculating AP/AR durning training. Hopes that I understand right. Stefano <[email protected]> 于2020年4月2日周四下午6:41写道：

…

Hi @murdockhou <https://github.com/murdockhou> ! The difference is that using warpAffine you can apply affine transformations instead of just cropping the person area. This is not useful during evaluation/testing, but it is used during training for data augmentation. If you look at the previous lines of the file (L258-L296) <https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L258-L296>, you can see that the parameters passed to the function get_affine_transform simply crop the image if self.is_train is False while their values are modified to change the scale and to rotate and flip the person area for data augmentation if self.is_train is True. I hope it is clearer now. Btw, I've adapted this code from the original implementation and some details are still unclear to me. In particular, I don't know the meaning of the parameter pixel_std (see line 109 <https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L109> ). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMCC7MSVFRN72IKJ5AZUSLRKRTUPANCNFSM4JDOIGNA> .

from simple-hrnet.

stefanopini commented on May 18, 2024

Hi!

Maybe I do not get the point of your question.
HRNet is a single-person HPE method, therefore input images should contain only one person (which is what happens both during training and testing).
Ms COCO provides both the keypoint annotations and the person bounding boxes, so it is possible to create a different example (with a different image crop) for each person using keypoint and bounding box annotations.
Each annotation is added to a list inthese lines:

simple-HRNet/datasets/COCO.py

Lines 152 to 228 in 2b93c23

 self.data = [] 

 # load annotations for each image of COCO 

 for imgId in tqdm(self.imgIds): 

 ann_ids = self.coco.getAnnIds(imgIds=imgId, iscrowd=False) 

 img = self.coco.loadImgs(imgId)[0] 

 if self.use_gt_bboxes: 

 objs = self.coco.loadAnns(ann_ids) 

 # sanitize bboxes 

 valid_objs = [] 

 for obj in objs: 

 # Skip non-person objects (it should never happen) 

 if obj['category_id'] != 1: 

 continue 

 # ignore objs without keypoints annotation 

 if max(obj['keypoints']) == 0: 

 continue 

 x, y, w, h = obj['bbox'] 

 x1 = np.max((0, x)) 

 y1 = np.max((0, y)) 

 x2 = np.min((img['width'] - 1, x1 + np.max((0, w - 1)))) 

 y2 = np.min((img['height'] - 1, y1 + np.max((0, h - 1)))) 

 # Use only valid bounding boxes 

 if obj['area'] > 0 and x2 >= x1 and y2 >= y1: 

 obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1] 

 valid_objs.append(obj) 

 objs = valid_objs 

 else: 

 objs = bboxes[imgId] 

 # for each annotation of this image, add the formatted annotation to self.data 

 for obj in objs: 

 joints = np.zeros((self.nof_joints, 2), dtype=np.float) 

 joints_visibility = np.ones((self.nof_joints, 2), dtype=np.float) 

 if self.use_gt_bboxes: 

 # COCO pre-processing 

 # # Moved above 

 # # Skip non-person objects (it should never happen) 

 # if obj['category_id'] != 1: 

 # continue 

 # 

 # # ignore objs without keypoints annotation 

 # if max(obj['keypoints']) == 0: 

 # continue 

 for pt in range(self.nof_joints): 

 joints[pt, 0] = obj['keypoints'][pt * 3 + 0] 

 joints[pt, 1] = obj['keypoints'][pt * 3 + 1] 

 t_vis = int(np.clip(obj['keypoints'][pt * 3 + 2], 0, 1)) # ToDo check correctness 

 # COCO: 

 # if visibility == 0 -> keypoint is not in the image. 

 # if visibility == 1 -> keypoint is in the image BUT not visible (e.g. behind an object). 

 # if visibility == 2 -> keypoint looks clearly (i.e. it is not hidden). 

 joints_visibility[pt, 0] = t_vis 

 joints_visibility[pt, 1] = t_vis 

 center, scale = self._box2cs(obj['clean_bbox'][:4]) 

 self.data.append({ 

 'imgId': imgId, 

 'annId': obj['id'], 

 'imgPath': os.path.join(self.root_path, self.data_version, '%012d.jpg' % imgId), 

 'center': center, 

 'scale': scale, 

 'joints': joints, 

 'joints_visibility': joints_visibility, 

 })

Then each image is cropped (to extract a specific person) and rescaled with an affine warping in:

simple-HRNet/datasets/COCO.py

Lines 290 to 296 in 2b93c23

 trans = get_affine_transform(c, s, self.pixel_std, r, self.image_size) 

 image = cv2.warpAffine( 

 image, 

 trans, 

 (int(self.image_size[0]), int(self.image_size[1])), 

 flags=cv2.INTER_LINEAR 

 )

Does this answer to your question?

from simple-hrnet.

murdockhou commented on May 18, 2024

Hi, Thanks for your reply. I think I have get your point. Everytime we get one train data we need to read its corresponding image using `cv2.imread`, but actually one image may contains more than two persons so that is it a little costing time reading same image serveral times?

from simple-hrnet.

stefanopini commented on May 18, 2024

Hi @murdockhou !

The difference is that using warpAffine you can apply affine transformations instead of just cropping the person area.
This is not useful during evaluation/testing, but it is used during training for data augmentation.
If you look at the previous lines of the file (L258-L296), you can see that the parameters passed to the function get_affine_transform simply crop the image if self.is_train is False while their values are modified to change the scale and to rotate and flip the person area for data augmentation if self.is_train is True.
I hope it is clearer now.

Btw, I've adapted this code from the original implementation and some details are still unclear to me.
In particular, I don't know the meaning of the parameter pixel_std (see line 109).

from simple-hrnet.

valentin-fngr commented on May 18, 2024

Thank you both of you for clarifying my understanding.
One question : @stefanopini has mentioned that we can only detect a single person per image.
When you say 'image' do you mean the cropped and rescaled bounding box area on the image ?

from simple-hrnet.

stefanopini commented on May 18, 2024

Hi @valentin-fngr , that's correct.
The HRNet model is designed as a top-down approach: person detection first (with almost any detector), then human pose estimation on the single person bounding box area with HRNet.
On the opposite, HigherHRNet is a bottom-up multiperson approach.

from simple-hrnet.

About the training loop about simple-hrnet HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self.data = []
	# load annotations for each image of COCO
	for imgId in tqdm(self.imgIds):

	ann_ids = self.coco.getAnnIds(imgIds=imgId, iscrowd=False)

	img = self.coco.loadImgs(imgId)[0]

	if self.use_gt_bboxes:
	objs = self.coco.loadAnns(ann_ids)

	# sanitize bboxes
	valid_objs = []
	for obj in objs:
	# Skip non-person objects (it should never happen)
	if obj['category_id'] != 1:
	continue

	# ignore objs without keypoints annotation
	if max(obj['keypoints']) == 0:
	continue

	x, y, w, h = obj['bbox']
	x1 = np.max((0, x))
	y1 = np.max((0, y))
	x2 = np.min((img['width'] - 1, x1 + np.max((0, w - 1))))
	y2 = np.min((img['height'] - 1, y1 + np.max((0, h - 1))))

	# Use only valid bounding boxes
	if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
	obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1]
	valid_objs.append(obj)

	objs = valid_objs

	else:
	objs = bboxes[imgId]

	# for each annotation of this image, add the formatted annotation to self.data
	for obj in objs:
	joints = np.zeros((self.nof_joints, 2), dtype=np.float)
	joints_visibility = np.ones((self.nof_joints, 2), dtype=np.float)

	if self.use_gt_bboxes:
	# COCO pre-processing

	# # Moved above
	# # Skip non-person objects (it should never happen)
	# if obj['category_id'] != 1:
	# continue
	#
	# # ignore objs without keypoints annotation
	# if max(obj['keypoints']) == 0:
	# continue

	for pt in range(self.nof_joints):
	joints[pt, 0] = obj['keypoints'][pt * 3 + 0]
	joints[pt, 1] = obj['keypoints'][pt * 3 + 1]
	t_vis = int(np.clip(obj['keypoints'][pt * 3 + 2], 0, 1)) # ToDo check correctness
	# COCO:
	# if visibility == 0 -> keypoint is not in the image.
	# if visibility == 1 -> keypoint is in the image BUT not visible (e.g. behind an object).
	# if visibility == 2 -> keypoint looks clearly (i.e. it is not hidden).
	joints_visibility[pt, 0] = t_vis
	joints_visibility[pt, 1] = t_vis

	center, scale = self._box2cs(obj['clean_bbox'][:4])

	self.data.append({
	'imgId': imgId,
	'annId': obj['id'],
	'imgPath': os.path.join(self.root_path, self.data_version, '%012d.jpg' % imgId),
	'center': center,
	'scale': scale,
	'joints': joints,
	'joints_visibility': joints_visibility,
	})

	trans = get_affine_transform(c, s, self.pixel_std, r, self.image_size)
	image = cv2.warpAffine(
	image,
	trans,
	(int(self.image_size[0]), int(self.image_size[1])),
	flags=cv2.INTER_LINEAR
	)