I trained PSA on pose estimation, it doesn't have the same effect as the paper claims.

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

This is an official implementation of:

Huajun Liu, Fuqiang Liu, Xinyi Fan and Dong Huang. Polarized Self-Attention: Towards High-quality Pixel-wise Regression Arxiv Version

Citation:

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

Codes and Pre-trained models will be uploaded soon~

Top-down 2D pose estimation models pre-trained on the MS-COCO keypoint task(Table4 in the Arxiv version).

Model Name	Backbone	Input Size	AP	pth file
UDP-Pose-PSA(p)	HRNet-W48	256x192	78.9	to be uploaded
UDP-Pose-PSA(p)	HRNet-W48	384x288	79.5	to be uploaded
UDP-Pose-PSA(s)	HRNet-W48	384x288	79.4	to be uploaded

Setup and inference:

Semantic segmentation models pre-trained on Cityscapes (Table5 in the Arxiv version).

Model Name	Backbone	val mIoU	pth file
HRNetV2-OCR+PSA(p)	HRNetV2-W48	86.95	download
HRNetV2-OCR+PSA(s)	HRNetV2-W48	86.72	download

	def spatial_pool(self, x):
	input_x = self.conv_v_right(x)

	batch, channel, height, width = input_x.size()

	# [N, IC, H*W]
	input_x = input_x.view(batch, channel, height * width)

	# [N, 1, H, W]
	context_mask = self.conv_q_right(x)

	# [N, 1, H*W]
	context_mask = context_mask.view(batch, 1, height * width)

	# [N, 1, H*W]
	context_mask = self.softmax_right(context_mask)

	# [N, IC, 1]
	# context = torch.einsum('ndw,new->nde', input_x, context_mask)
	context = torch.matmul(input_x, context_mask.transpose(1,2))
	# [N, IC, 1, 1]
	context = context.unsqueeze(-1)

	# [N, OC, 1, 1]
	context = self.conv_up(context)

	# [N, OC, 1, 1]
	mask_ch = self.sigmoid(context)

	out = x * mask_ch

	return out

delightcmu / psa Goto Github PK

psa's Introduction

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Citation:

Codes and Pre-trained models will be uploaded soon~

Top-down 2D pose estimation models pre-trained on the MS-COCO keypoint task(Table4 in the Arxiv version).

Setup and inference:

Semantic segmentation models pre-trained on Cityscapes (Table5 in the Arxiv version).

Setup and inference:

psa's People

Contributors

Stargazers

Watchers

Forkers

psa's Issues

Recommend Projects

Recommend Topics

Recommend Org