jxgu1016 / mnist_center_loss_pytorch Goto Github PK

View Code? Open in Web Editor NEW

362.0 10.0 89.0 16.7 MB

A PyTorch implementation of center loss on MNIST

Home Page: https://github.com/jxgu1016/MNIST_center_loss_pytorch

License: MIT License

Python 100.00%

pytorch deeplearning center-loss centerloss center

mnist_center_loss_pytorch's Introduction

UPDATE(Oct. 2018)

By dropping the bias of the last fc layer according to the issue, the centers tend to distribute around a circle as reported in the orignal paper.

UPDATE(May. 2018)

Migration to PyTorch 0.4 done!

UPDATE(Apr. 2018)

Thanks @wenfahu for accomplishing the optimization of backward().

UPDATE(Mar. 2018)

Problems reported in the NOTIFICATION now has been SOLVED! Functionally, this repo is exactly the same as the official repo. New result is shown below and looks similar to the former one. If you want to try the former one, please return to Commits on Feb 12, 2018.

Some codes can be and should be optimized when calculating Eq.4 in backword() to replace the for-loop and feel free to pull your request.

NOTIFICATION(Feb. 2018)

In the begining, it was just a practise project to get familiar with PyTorch. Surprisedly, I didn't expect that there would be so many researchers following my repo of center loss. In that case, I'd like to illustrate that this implementation is not exactly the same as the official one.

If you read the equations in the paper carefully, the defination of center loss in the Eq. 2 can only lead you to the Eq. 3 but the update equation of centers in Eq. 4 can not be inferred arrcoding to the differentiation formulas. If not specified, the derivatives of one module are decided by the forward operation following the strategy of autograd in PyTorch. Considering the incompatibility of Eq. 3 and Eq. 4, only one of them can be implemented correctly and what I chose was the latter one. If you remvoe the centers_count in my code, this will lead you to the Eq. 3.

This problem exists in other implementaions and the impact remains unknown but looks harmless.

TO DO: To specify the derivatives just like the original caffe repo, instead of being calculated by autograd system.

MNIST_center_loss_pytorch

A pytorch implementation of center loss on MNIST and it's a toy example of ECCV2016 paper A Discriminative Feature Learning Approach for Deep Face Recognition

In order to ease the classifiers, center loss was designed to make samples in each class flock together.

Results are shown below:

softmax loss and center loss(new)

softmax loss and center loss(old)

only softmax loss

The code also includes visualization of the training process and please wait until these gifs load

softmax loss and center loss(new)

softmax loss and center loss(old)

only softmax loss

mnist_center_loss_pytorch's People

Contributors

Stargazers

Watchers

Forkers

achaiah tomsay ddxu jeffreyyihuang zhly0 zhnidj liushuchun tair-ai xiaoiker wenfahu xzyaoi dsp6414 forgithubone queenie88 stevenwhu reeshark vitvicky pursueorigin samxuxiang rohitkeshari buaaer1016 anweiwei yyaoyang xyj77 liuheng0111 owalnuto zhangjuju xiaobingdu deep0learning yongyehuang ellieee777 chmod740 yyf8989 dclpwos zhyj3038 yihengjiang ztyxd skyneta mengkunzhao chuckgithub 2387762766 messalina1120 thoamsdong tensorflow-pool chengxiaoy lunez99 yzspku is0910635 nick-diaz txytju knightliu110 swhan9873 zhkfu rhealalalala lironghuo githubpgq yogsin jimmyxzy qiu931110 xrosliang jfyao90 ella-mrc freegliboracle zyg11 x-funbean vincezengqiang sahil-iit kehuantiantang sramit haganelego zerojiang want2vanish sysu19351162 purpleleaves007 czifan aporduo vertyxzz maxpark lustoo chynaliu888 yang-jincheng fasladodo dl-loss caibolun xiezixiustc andyeyeye ly54 immortallis1

mnist_center_loss_pytorch's Issues

About some details

In the code, loss is caculated by:
loss = criterion[0](pred, target) + loss_weight * criterion[1](target, ip1)

Why not
loss = criterion[0](pred, target) + loss_weight * criterion[1](target, ip1) / target.size(0)

what is the difference of center loss(old) and center loss(new)

Hi, what is the difference of center loss(old) and center loss(new)?

how to update center？

hihi,thx for u to provide such a excellent work, and i want to know how to update class center?

other datasets?

Have you ever tested this with other datasets? like CIFAR?

"new_empty()"function. Is it a python function or pytorch function?

Sorry,I can't run the codes because the erro of "new_empty()"function. Is it a python function or pytorch function? I run it below pytorch 0.4 version.

About autograd

Hey~, I noticed that you have written center loss with backward designed by yourself. What would happen if define the forward function only,and use autograd in pytorch？I wonder whether there lies any differences between them.

Center Loss backprop optimzation

What about using scatter op in pytorch to replace the for loop in center loss backward.
There is an undocumented op torch.Tensor.scatter_ add_

it seems doesn't work well

the top-1 ACC is around 11%, and the null_loss is around 0.03 ,center_loss around 0.0001(almost doesn't work),how does it happen?

`# -- coding: utf-8 -

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from torch.utils.data import DataLoader
import torch.optim.lr_scheduler as lr_scheduler
from CenterLoss import CenterLoss
import matplotlib.pyplot as plt

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1_1 = nn.Conv2d(1, 32, kernel_size=5, padding=2)
self.prelu1_1 = nn.PReLU()
self.conv1_2 = nn.Conv2d(32, 32, kernel_size=5, padding=2)
self.prelu1_2 = nn.PReLU()
self.conv2_1 = nn.Conv2d(32, 64, kernel_size=5, padding=2)
self.prelu2_1 = nn.PReLU()
self.conv2_2 = nn.Conv2d(64, 64, kernel_size=5, padding=2)
self.prelu2_2 = nn.PReLU()
self.conv3_1 = nn.Conv2d(64, 128, kernel_size=5, padding=2)
self.prelu3_1 = nn.PReLU()
self.conv3_2 = nn.Conv2d(128, 128, kernel_size=5, padding=2)
self.prelu3_2 = nn.PReLU()
self.preluip1 = nn.PReLU()
self.ip1 = nn.Linear(12833, 2)
self.ip2 = nn.Linear(2, 10)

def forward(self, x):
    x = self.prelu1_1(self.conv1_1(x))
    x = self.prelu1_2(self.conv1_2(x))
    x = F.max_pool2d(x,2)
    x = self.prelu2_1(self.conv2_1(x))
    x = self.prelu2_2(self.conv2_2(x))
    x = F.max_pool2d(x,2)
    x = self.prelu3_1(self.conv3_1(x))
    x = self.prelu3_2(self.conv3_2(x))
    x = F.max_pool2d(x,2)
    x = x.view(-1, 128*3*3)
    ip1 = self.preluip1(self.ip1(x))
    ip2 = self.ip2(ip1)
    return ip1,F.log_softmax(ip2)

'''
def visualize(feat, labels, epoch):
plt.ion()
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff',
'#ff00ff', '#990000', '#999900', '#009900', '#009999']
plt.clf()
for i in range(10):
plt.plot(feat[labels == i, 0], feat[labels == i, 1], '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], loc = 'upper right')

plt.xlim(xmin=-5,xmax=5)

plt.ylim(ymin=-5,ymax=5)

plt.text(-4.8,4.6,"epoch=%d" % epoch)
plt.savefig('./images/epoch=%d.jpg' % epoch)
plt.draw()
plt.pause(0.001)

'''

def main():
if torch.cuda.is_available():
use_cuda = True
else: use_cuda = False
# Dataset
trainset = datasets.MNIST('../data', download=True,train=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))]))
train_loader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

testset = datasets.MNIST('../data', download=True,train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))]))
test_loader = DataLoader(testset, batch_size=128, shuffle=True, num_workers=4)


# Model
model = Net()
print(model)

# NLLLoss
nllloss = nn.NLLLoss() #CrossEntropyLoss = log_softmax + NLLLoss
# CenterLoss
loss_weight = 1
centerloss = CenterLoss(10, 2)
if use_cuda:
    nllloss = nllloss.cuda()
    centerloss = centerloss.cuda()
    model = model.cuda()
criterion = [nllloss, centerloss]

# optimzer4nn
optimizer4nn = optim.SGD(model.parameters(),lr=0.001,momentum=0.9, weight_decay=0.0005)
sheduler = lr_scheduler.StepLR(optimizer4nn,20,gamma=0.8)

# optimzer4center
optimizer4center = optim.SGD(centerloss.parameters(), lr =0.5)

for epoch in range(50):
    sheduler.step()
    print('epoch {}'.format(epoch + 1))
    # print optimizer4nn.param_groups[0]['lr']
print "Training... Epoch = %d" % epoch
ip1_loader = []
idx_loader = []
train_loss = 0.
train_acc = 0.
    train_nll = 0.
    train_cen = 0.
for i,(data, target) in enumerate(train_loader):
    if use_cuda:
	data = data.cuda()
	target = target.cuda()
        data, target = Variable(data), Variable(target)

    ip1, pred = model(data)
        #out = torch.max(pred, 1)[1]
    loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
    train_loss += loss.data[0]
    out = torch.max(pred, 1)[1]
    train_correct = (out == target).sum()
    train_acc += train_correct.data[0]
        train_nll += nllloss(pred, target).data[0]
        train_cen += centerloss(target, ip1).data[0]

    optimizer4nn.zero_grad()
    optimizer4center.zero_grad()

    loss.backward()

    optimizer4nn.step()
    optimizer4center.step()

	#ip1_loader.append(ip1)
	#idx_loader.append((target))


print('Train Loss: {:.6f}, Acc: {:.6f},nn Loss: {:.6f}, centerloss: {:.6f}'.format(train_loss / (len(trainset)),train_acc / (len(trainset)),train_nll / (len(trainset)),train_cen / (len(trainset))))
'''
feat = torch.cat(ip1_loader, 0)
labels = torch.cat(idx_loader, 0)
visualize(feat.data.cpu().numpy(),labels.data.cpu().numpy(),epoch)
'''
model.eval()
eval_loss = 0.
eval_acc = 0.
for i,(data, target) in enumerate(test_loader):
    if use_cuda:
	data = data.cuda()
	target = target.cuda()

    data, target = Variable(data), Variable(target)

    ip1, pred = model(data)
        #pred = torch.max(pred, 1)[1]
    loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
    eval_loss += loss.data[0]
    out = torch.max(pred, 1)[1]
    eval_correct = (out == target).sum()
    eval_acc += eval_correct.data[0]

#optimizer[0].zero_grad()
#optimizer[1].zero_grad()

#loss.backward()

#optimizer[0].step()
#optimizer[1].step()

#ip1_loader.append(ip1)
#idx_loader.append((target))

print('Test Loss: {:.6f}, Acc: {:.6f}'.format(eval_loss / (len(
testset)), eval_acc / (len(testset))))

if name == 'main':
main()

'''
epoch 9
Training... Epoch = 8
Train Loss: 0.017925, Acc: 0.112367
Test Loss: 0.018107, Acc: 0.113500
epoch 10
Training... Epoch = 9
Train Loss: 0.017911, Acc: 0.112367
Test Loss: 0.018094, Acc: 0.11350
'''
`

about loss

In my project experiment, the center loss does not decrease with the number of iterations.It seemingly irregular changes. I don't quite understand what's going on. I'd like to ask you about it.Thanks.

a tiny cleanup for `scatter_add_()`

Since you're doing in-place operation at line 42 of Centerloss.py. You don't need to re-assign it. So just:

counts.scatter_add_(0, label.long(), ones)

About figure of features

I wonder that how to change the code to plot the figure of features with only softmax loss.
And what do you think about the different shape between the Fig.3 in original paper and the figure produced though your codes.

Question about nLLLoss and CrossEnctropyLoss

Hi, I find the code uses F.log_softmax() and nn.nLLLoss() to count the softmax entropy loss.
And I use nn.CrossEntropyLoss() instead.
But the result is very bad ——test acc is only 93%.
Could you please tell me what's wrong? Thank you!

gradient needs to be divided by batch size

It appears that gradient is not being divided by batch size in CenterlossFunc()

I change it to:

@staticmethod
def forward(ctx, feature, label, centers):
    ctx.save_for_backward(feature, label, centers)
    centers_batch = centers.index_select(0, label.long())
    return (feature - centers_batch).pow(2).sum() / 2.0 / feature.size()[0]


@staticmethod
def backward(ctx, grad_output):
    feature, label, centers = ctx.saved_tensors
    centers_batch = centers.index_select(0, label.long())
    diff = centers_batch - feature
    # init every iteration
    counts = centers.new(centers.size(0)).fill_(1)
    ones = centers.new(label.size(0)).fill_(1)
    grad_centers = centers.new(centers.size()).fill_(0)

    counts = counts.scatter_add_(0, label.long(), ones)
    grad_centers.scatter_add_(0, label.unsqueeze(1).expand(feature.size()).long(), diff)
    grad_centers = grad_centers/(counts.view(-1, 1))
    return - grad_output.data * diff / feature.size()[0], None, grad_centers

Figure now looks like:

how to implement CenterLoss using cosine distance

Hi,
Thank you for your great job! I have a question, how to implement CenterLoss using Cosine distance rather than Euclidean distance? Thank you!

Only implemented softmax

loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
the loss is centerloss and softmaxloss，the result is correct
when i remove the centerloss, loss = nllloss(pred, target),
the result is incorrect.

gradient needs to be divided by batch size v2

#9
this issue has been raised before.

but I find the average operation only influence the gradient of Xi, which the gradient of centers didn't be divided by batch size.

for example:
if the gradient of Xi is Grad_Xi, and gradient of Center is Grad_center when loss = LO
if LO/10, the gradient of Xi will be Grad_Xi/10, but the gradient of Center still be Grad_center

Could you help to check it?

question about center loss item

Excuse me, the formula says that Xi is a feature dimension that changes with depth. How does the code reflect it? I can't see that. Is feat_dim set as the input dimension, and can it ensure that the optimized output dimension is the final one? Please answer my question, thank you.

L=L1+Lc
L.backward()