ilovepose / darkpose Goto Github PK
View Code? Open in Web Editor NEWDistribution-Aware Coordinate Representation for Human Pose Estimation
Home Page: https://ilovepose.github.io/coco
License: Apache License 2.0
Distribution-Aware Coordinate Representation for Human Pose Estimation
Home Page: https://ilovepose.github.io/coco
License: Apache License 2.0
I've tried to write demo code but I got stuck how to interpreter output of network:
import argparse
import os
import cv2
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms
from config import cfg
from config import update_config
from core.inference import get_final_preds
from utils.vis import save_debug_images
import glob
from models.pose_hrnet import get_pose_net
def parse_args():
parser = argparse.ArgumentParser(description='Train keypoints network')
# general
parser.add_argument('--cfg',
help='experiment configure file name',
default='experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml',
type=str)
parser.add_argument('opts',
help="Modify config options using the command-line",
default=None,
nargs=argparse.REMAINDER)
parser.add_argument('--modelDir',
help='model directory',
type=str,
default='')
parser.add_argument('--logDir',
help='log directory',
type=str,
default='')
parser.add_argument('--dataDir',
help='data directory',
type=str,
default='./Inputs/')
parser.add_argument('--prevModelDir',
help='prev Model directory',
type=str,
default='')
args = parser.parse_args()
return args
def save_images(img, joints_pred, name,nrow=8, padding=2):
height = int(img.size(0) + padding)
width = int(img.size(1) + padding)
nmaps = 1
xmaps = min(nrow, nmaps)
ymaps = int(math.ceil(float(nmaps) / xmaps))
height = int(batch_image.size(2) + padding)
width = int(batch_image.size(3) + padding)
k = 0
for y in range(ymaps):
for x in range(xmaps):
if k >= nmaps:
break
joints = batch_joints[k]
joints_vis = batch_joints_vis[k]
for joint in joints:
joint[0] = x * width + padding + joint[0]
joint[1] = y * height + padding + joint[1]
cv2.circle(img, (int(joint[0]), int(joint[1])), 2, [255, 0, 0], 2)
k = k + 1
cv2.imwrite(f"Results/{name}", img)
def main():
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)
transform = transforms.Compose([
transforms.ToTensor(),
normalize,
])
args = parse_args()
update_config(cfg, args)
image_size = np.array(cfg.MODEL.IMAGE_SIZE)
model = get_pose_net(
cfg, is_train=False
)
if cfg.TEST.MODEL_FILE:
model.load_state_dict(torch.load(cfg.TEST.MODEL_FILE), strict=False)
else:
model_state_file = os.path.join(
final_output_dir, 'final_state.pth'
)
model.load_state_dict(torch.load(model_state_file))
model = torch.nn.DataParallel(model, device_ids=cfg.GPUS).cuda()
img_path_l = sorted(glob.glob('./Inputs' + '/*'))
with torch.no_grad():
for path in img_path_l:
name = path.split('/')[-1]
image = cv2.imread(path)
image = cv2.resize(image, (384, 288))
input = transform(image).unsqueeze(0)
#print(input.shape)
outputs = model(input)
if isinstance(outputs, list):
output = outputs[-1]
else:
output = outputs
print(f"{name} : {output.shape}")
if __name__ == '__main__':
main()
I don't know what I set scale and center in get_final_preds
.
In inference.py, function taylor, the 2nd derivative dxy
is calculated by:
dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] + hm[py-1][px-1])
So could you explain why is the equation like this?
excuse me ,i am a begonner of cs ,i want to know when will you show your code ,sorry to disturb you
When I tried to train the hourglass network with image's input size of 128 x 96, the code threw an error about tensor size mismatch in this line:
DarkPose/lib/models/hourglass.py
Line 91 in 612fad5
Hi, I am new to DarkPose. Is there any address for downloading pretrained model file?
In the paper you mention a "model-agnostic plugin" - will you open source the code for your approach?
I can't download the AI challenger data. (https://challenger.ai/dataset/keypoint)
How do you get the result for HRnet W48 * ( with extra data?)
DarkPose/lib/core/inference.py
Line 51 in 0185b42
According to the taylor series, the offset is equal to -InvH * g. While in this code, the offset is equal to -2*InvH * g because of the two coefficients. Can you please explain it?
Hi,
I ran the test.py with the default HRNet w32 256 x 192 and HRNet w32 384 x 288. I am able to only reproduce the author's scores of 74.4 and 75.8 respectively.
Command Used :
python tools/test.py --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml TEST.MODEL_FILE <MODEL_PATH> TEST.USE_GT_BBOX False
The MODEL_FILE used was the original author's model.
and likewise for 384 x 288.
I observed that the function taylor(hm, coord) in lib/core/inference.py was being invoked but I am not able to reproduce the results provided by you.
What do I need to to do reproduce the results provided ?
Thanks in advance
DarkPose/lib/dataset/JointsDataset.py /line 284-286
feat_stride = self.image_size / self.heatmap_size
mu_x = joint[0]
mu_y = joint[1]
I have a question that why the code is not as follow:
feat_stride = self.image_size / self.heatmap_size
mu_x = joint[0]/feat_stride[0]
mu_y = joint[1]/feat_stride[1]
And I search the JointsDataset.py, I find that you don't use the 'feat_stride' in anywhere, but if your heatmap is 1/4 downsampling of the ori-image, I think it's a neccecery step to use the 'feat_stride'.
I want to know if my understanding is wrong or the code is wrong, thanks.
Where is this DARK decode part of the algorithm? @xizero00
Did you have test the performance of Dark if the flip test is not used? It seems that we should use flip test and Dark or standard shifting together, but you didn't mention that in your paper nor did HRNet.
For video inference, what is the fastest fps
Didn't understand a point and would like to add it to the blogger and all the other bigwigs please. What is described in the article is an inference of the actual point "µ" based on the maximum probability value "m" extracted from the heat map, so in a continuous function, shouldn't the first order derivative of the maximum probability value point m be 0?
Hi, thanks for your great work! I have tried this work to train on my own dataset with ImageNet-pretrained weights, however, during the training process, the loss is very low(nearly 0.00060 initially ). So, is this phenomenon normal? and will not leading to gradient vanishing?
In evaluate.py, function calc_dists, the Euclidean distance will be calculated under the condition:
if target[n, c, 0] > 1 and target[n, c, 1] > 1:
It seems that you exclude the case where the target coordinates are in [0, 1], so why do this?
I want to test model on test-dev set, but the code tool/test.py is only for testing on validation set (5000 images):
So, I have a question: How to test the model on COCO test-dev set 2017 to submit the json file to codalab server?
I'm looking forward from you. Thank you for your great work!
I am currently working on extending the OKS similarity metric to MPII dataset, haven't finished it yet so, I am unaware of the problems but for now I haven't faced any. However, I wonder why none of the papers submit AP, AR on MPII dataset?
why does everyone use PCK for MPII dataset, AP for COCO? Is there any particular reason??
Also, If someone has already implemented can you share in this thread?
Thanks.
Can I setup nms on windows ?
Where is this DARK decode part of the algorithm? @xizero00
It seems from your code that you are selectively discarding some annotations. If I understand correctly you look at the center of all the visible keypoints and the center of the bounding box annotation and measure the ks between these two points.
However it is not clear how you selected the values to measure this heuristic. In particular:
ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area))
metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16
correspond to and what does the number 0.45 represent?Thanks a lot! Great work!
Hello, where is the w48 256x192 dark pre-trained model?
For the input size of 128x96, BLUR_KERNEL is 3. However, BLUR_KERNEL is 11 for the input size of 256x192. Can you explain the reason? Thanks.
hi,thanks for your wonderful work. But I still have one question. The kernel size of gaussian_blur for output map is set to 11 that differs from the one of gt map,which is 1-3. And in paper ,it said "Specifically, to match the requirement of our method we
propose exploiting a Gaussian kernel K with the same variation as the training data to smooth out the effects of multiple
peaks in the heatmap h"
so , how to set the kernel size ? tks
Hello.
In paper, the learning rate is described as:
"the base learning rate was fine-tuned to 2.5e-4, and decayed to 2.5e-5 and 2.5e-6 at the 90-th and 120-th epoch".
In the repo, the initial learning rate is 0.001.
Which one is better? Should I change it as what is described as in paper for reproducing?
Hi, thanks for your great work.
At README.md, there's an evaluation result of HRNet+Dark trained on MSCOCO+AI Challenger with some other techniques. (Indicated with *-+)
Where can I found the pretrained weights of this best performing model? I can't find it on the link you provided
Thanks in advance.
Hi, thanks for your great work!
We would cite your paper, could you give me the result of your model on MPII test dataset?
I'd like to know if the authors plan to publish inference script for pre-trained models in Python.
can it be used to bottom-up model?
I downloaded your project, tried to train Hourglass, but got following error:
=> creating output/coco/hourglass/hg4_128x96_d256x3_adam_lr2
=> creating log/coco/hourglass/hg4_128x96_d256x3_adam_lr2_2021-08-16-12-55
Namespace(cfg='experiments/coco/hourglass/hg4_128x96_d256x3_adam_lr2.5e-4.yaml', dataDir='', logDir='', modelDir='', opts=[], prevModelDir='')
AUTO_RESUME: True
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
COLOR_RGB: False
DATASET: coco
DATA_FORMAT: jpg
FLIP: True
HYBRID_JOINTS_TYPE:
NUM_JOINTS_HALF_BODY: 8
PROB_HALF_BODY: 0.0
ROOT: data/coco
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
SELECT_DATA: False
TEST_SET: val2017
TRAIN_SET: train2017
DATA_DIR:
DEBUG:
DEBUG: True
SAVE_BATCH_IMAGES_GT: True
SAVE_BATCH_IMAGES_PRED: True
SAVE_HEATMAPS_GT: True
SAVE_HEATMAPS_PRED: True
GPUS: (0,)
LOG_DIR: log
LOSS:
TOPK: 8
USE_DIFFERENT_JOINTS_WEIGHT: False
USE_OHKM: False
USE_TARGET_WEIGHT: True
MODEL:
EXTRA:
NUM_BLOCKS: 1
NUM_FEATURES: 256
NUM_STACKS: 4
HEATMAP_SIZE: [24, 32]
IMAGE_SIZE: [96, 128]
INIT_WEIGHTS: False
NAME: hourglass
NUM_JOINTS: 17
PRETRAINED: models/pytorch/imagenet/resnet50-19c8e357.pth
SIGMA: 1
TAG_PER_JOINT: True
TARGET_TYPE: gaussian
OUTPUT_DIR: output
PIN_MEMORY: True
PRINT_FREQ: 100
RANK: 0
TEST:
BATCH_SIZE_PER_GPU: 32
BBOX_THRE: 1.0
BLUR_KERNEL: 11
COCO_BBOX_FILE: data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json
FLIP_TEST: True
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE:
NMS_THRE: 1.0
OKS_THRE: 0.9
POST_PROCESS: True
SOFT_NMS: False
USE_GT_BBOX: True
TRAIN:
BATCH_SIZE_PER_GPU: 8
BEGIN_EPOCH: 0
CHECKPOINT:
END_EPOCH: 140
GAMMA1: 0.99
GAMMA2: 0.0
LR: 0.00025
LR_FACTOR: 0.1
LR_STEP: [90, 120]
MOMENTUM: 0.9
NESTEROV: False
OPTIMIZER: adam
RESUME: False
SHUFFLE: True
WD: 0.0001
WORKERS: 24
The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 3
Error occurs, No graph saved
Traceback (most recent call last):
File "/home/fl/dark/tools/train.py", line 223, in
main()
File "/home/fl/dark/tools/train.py", line 111, in main
writer_dict['writer'].add_graph(model, (dump_input, ))
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/tensorboardX/writer.py", line 945, in add_graph
self._get_file_writer().add_graph(graph(model, input_to_model, verbose))
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 292, in graph
raise e
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 286, in graph
trace = torch.jit.trace(model, args)
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/jit/_trace.py", line 742, in trace
_module_class,
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/jit/_trace.py", line 940, in trace_module
_force_outplace,
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 182, in forward
y = self.hgi
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/home/fl/miniconda3/envs/pose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 95, in forward
return self._hour_glass_forward(self.depth, x)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 86, in _hour_glass_forward
low2 = self._hour_glass_forward(n-1, low1)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 86, in _hour_glass_forward
low2 = self._hour_glass_forward(n-1, low1)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 86, in _hour_glass_forward
low2 = self._hour_glass_forward(n-1, low1)
File "/home/fl/dark/tools/../lib/models/hourglass.py", line 91, in _hour_glass_forward
out = up1 + up2
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 3
Process finished with exit code 1
In inference.py, the function get_max_preds, I think the code shown as follows needs to add - 1
to its tail:
preds[:, :, 0] = (preds[:, :, 0]) % width
should be rectified as:
preds[:, :, 0] = (preds[:, :, 0]) % width - 1
So what's your opinion?
Run with DarkPose's open source code. The test result on HRNet is the same as the result without DARK. There is no improvement. What is the reason? The configuration file and pre-training model are all used in HRNet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.