Giter VIP home page Giter VIP logo

mnist_center_loss_pytorch's Introduction

UPDATE(Oct. 2018)

By dropping the bias of the last fc layer according to the issue, the centers tend to distribute around a circle as reported in the orignal paper.

UPDATE(May. 2018)

Migration to PyTorch 0.4 done!

UPDATE(Apr. 2018)

Thanks @wenfahu for accomplishing the optimization of backward().

UPDATE(Mar. 2018)

Problems reported in the NOTIFICATION now has been SOLVED! Functionally, this repo is exactly the same as the official repo. New result is shown below and looks similar to the former one. If you want to try the former one, please return to Commits on Feb 12, 2018.

Some codes can be and should be optimized when calculating Eq.4 in backword() to replace the for-loop and feel free to pull your request.

NOTIFICATION(Feb. 2018)

In the begining, it was just a practise project to get familiar with PyTorch. Surprisedly, I didn't expect that there would be so many researchers following my repo of center loss. In that case, I'd like to illustrate that this implementation is not exactly the same as the official one.

If you read the equations in the paper carefully, the defination of center loss in the Eq. 2 can only lead you to the Eq. 3 but the update equation of centers in Eq. 4 can not be inferred arrcoding to the differentiation formulas. If not specified, the derivatives of one module are decided by the forward operation following the strategy of autograd in PyTorch. Considering the incompatibility of Eq. 3 and Eq. 4, only one of them can be implemented correctly and what I chose was the latter one. If you remvoe the centers_count in my code, this will lead you to the Eq. 3.

This problem exists in other implementaions and the impact remains unknown but looks harmless.

TO DO: To specify the derivatives just like the original caffe repo, instead of being calculated by autograd system.

MNIST_center_loss_pytorch

A pytorch implementation of center loss on MNIST and it's a toy example of ECCV2016 paper A Discriminative Feature Learning Approach for Deep Face Recognition

In order to ease the classifiers, center loss was designed to make samples in each class flock together.

Results are shown below:

softmax loss and center loss(new)
softmax loss and center loss(old)
only softmax loss

The code also includes visualization of the training process and please wait until these gifs load

softmax loss and center loss(new)
softmax loss and center loss(old)
only softmax loss

mnist_center_loss_pytorch's People

Contributors

jxgu1016 avatar wenfahu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mnist_center_loss_pytorch's Issues

About some details

In the code, loss is caculated by:
loss = criterion[0](pred, target) + loss_weight * criterion[1](target, ip1)

Why not
loss = criterion[0](pred, target) + loss_weight * criterion[1](target, ip1) / target.size(0)

how to update center?

hihi,thx for u to provide such a excellent work, and i want to know how to update class center?

other datasets?

Have you ever tested this with other datasets? like CIFAR?

About autograd

Hey~, I noticed that you have written center loss with backward designed by yourself. What would happen if define the forward function only,and use autograd in pytorch?I wonder whether there lies any differences between them.

Center Loss backprop optimzation

What about using scatter op in pytorch to replace the for loop in center loss backward.
There is an undocumented op torch.Tensor.scatter_ add_

it seems doesn't work well

the top-1 ACC is around 11%, and the null_loss is around 0.03 ,center_loss around 0.0001(almost doesn't work),how does it happen?

`# -- coding: utf-8 -

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from torch.utils.data import DataLoader
import torch.optim.lr_scheduler as lr_scheduler
from CenterLoss import CenterLoss
import matplotlib.pyplot as plt

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1_1 = nn.Conv2d(1, 32, kernel_size=5, padding=2)
self.prelu1_1 = nn.PReLU()
self.conv1_2 = nn.Conv2d(32, 32, kernel_size=5, padding=2)
self.prelu1_2 = nn.PReLU()
self.conv2_1 = nn.Conv2d(32, 64, kernel_size=5, padding=2)
self.prelu2_1 = nn.PReLU()
self.conv2_2 = nn.Conv2d(64, 64, kernel_size=5, padding=2)
self.prelu2_2 = nn.PReLU()
self.conv3_1 = nn.Conv2d(64, 128, kernel_size=5, padding=2)
self.prelu3_1 = nn.PReLU()
self.conv3_2 = nn.Conv2d(128, 128, kernel_size=5, padding=2)
self.prelu3_2 = nn.PReLU()
self.preluip1 = nn.PReLU()
self.ip1 = nn.Linear(12833, 2)
self.ip2 = nn.Linear(2, 10)

def forward(self, x):
    x = self.prelu1_1(self.conv1_1(x))
    x = self.prelu1_2(self.conv1_2(x))
    x = F.max_pool2d(x,2)
    x = self.prelu2_1(self.conv2_1(x))
    x = self.prelu2_2(self.conv2_2(x))
    x = F.max_pool2d(x,2)
    x = self.prelu3_1(self.conv3_1(x))
    x = self.prelu3_2(self.conv3_2(x))
    x = F.max_pool2d(x,2)
    x = x.view(-1, 128*3*3)
    ip1 = self.preluip1(self.ip1(x))
    ip2 = self.ip2(ip1)
    return ip1,F.log_softmax(ip2)

'''
def visualize(feat, labels, epoch):
plt.ion()
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff',
'#ff00ff', '#990000', '#999900', '#009900', '#009999']
plt.clf()
for i in range(10):
plt.plot(feat[labels == i, 0], feat[labels == i, 1], '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], loc = 'upper right')

plt.xlim(xmin=-5,xmax=5)

plt.ylim(ymin=-5,ymax=5)

plt.text(-4.8,4.6,"epoch=%d" % epoch)
plt.savefig('./images/epoch=%d.jpg' % epoch)
plt.draw()
plt.pause(0.001)

'''

def main():
if torch.cuda.is_available():
use_cuda = True
else: use_cuda = False
# Dataset
trainset = datasets.MNIST('../data', download=True,train=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))]))
train_loader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

testset = datasets.MNIST('../data', download=True,train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))]))
test_loader = DataLoader(testset, batch_size=128, shuffle=True, num_workers=4)


# Model
model = Net()
print(model)

# NLLLoss
nllloss = nn.NLLLoss() #CrossEntropyLoss = log_softmax + NLLLoss
# CenterLoss
loss_weight = 1
centerloss = CenterLoss(10, 2)
if use_cuda:
    nllloss = nllloss.cuda()
    centerloss = centerloss.cuda()
    model = model.cuda()
criterion = [nllloss, centerloss]

# optimzer4nn
optimizer4nn = optim.SGD(model.parameters(),lr=0.001,momentum=0.9, weight_decay=0.0005)
sheduler = lr_scheduler.StepLR(optimizer4nn,20,gamma=0.8)

# optimzer4center
optimizer4center = optim.SGD(centerloss.parameters(), lr =0.5)

for epoch in range(50):
    sheduler.step()
    print('epoch {}'.format(epoch + 1))
    # print optimizer4nn.param_groups[0]['lr']
print "Training... Epoch = %d" % epoch
ip1_loader = []
idx_loader = []
train_loss = 0.
train_acc = 0.
    train_nll = 0.
    train_cen = 0.
for i,(data, target) in enumerate(train_loader):
    if use_cuda:
	data = data.cuda()
	target = target.cuda()
        data, target = Variable(data), Variable(target)

    ip1, pred = model(data)
        #out = torch.max(pred, 1)[1]
    loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
    train_loss += loss.data[0]
    out = torch.max(pred, 1)[1]
    train_correct = (out == target).sum()
    train_acc += train_correct.data[0]
        train_nll += nllloss(pred, target).data[0]
        train_cen += centerloss(target, ip1).data[0]

    optimizer4nn.zero_grad()
    optimizer4center.zero_grad()

    loss.backward()

    optimizer4nn.step()
    optimizer4center.step()

	#ip1_loader.append(ip1)
	#idx_loader.append((target))


print('Train Loss: {:.6f}, Acc: {:.6f},nn Loss: {:.6f}, centerloss: {:.6f}'.format(train_loss / (len(trainset)),train_acc / (len(trainset)),train_nll / (len(trainset)),train_cen / (len(trainset))))
'''
feat = torch.cat(ip1_loader, 0)
labels = torch.cat(idx_loader, 0)
visualize(feat.data.cpu().numpy(),labels.data.cpu().numpy(),epoch)
'''
model.eval()
eval_loss = 0.
eval_acc = 0.
for i,(data, target) in enumerate(test_loader):
    if use_cuda:
	data = data.cuda()
	target = target.cuda()

    data, target = Variable(data), Variable(target)

    ip1, pred = model(data)
        #pred = torch.max(pred, 1)[1]
    loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
    eval_loss += loss.data[0]
    out = torch.max(pred, 1)[1]
    eval_correct = (out == target).sum()
    eval_acc += eval_correct.data[0]

#optimizer[0].zero_grad()
#optimizer[1].zero_grad()

#loss.backward()

#optimizer[0].step()
#optimizer[1].step()

#ip1_loader.append(ip1)
#idx_loader.append((target))

print('Test Loss: {:.6f}, Acc: {:.6f}'.format(eval_loss / (len(
testset)), eval_acc / (len(testset))))

if name == 'main':
main()

'''
epoch 9
Training... Epoch = 8
Train Loss: 0.017925, Acc: 0.112367
Test Loss: 0.018107, Acc: 0.113500
epoch 10
Training... Epoch = 9
Train Loss: 0.017911, Acc: 0.112367
Test Loss: 0.018094, Acc: 0.11350
'''
`

about loss

In my project experiment, the center loss does not decrease with the number of iterations.It seemingly irregular changes. I don't quite understand what's going on. I'd like to ask you about it.Thanks.

About figure of features

I wonder that how to change the code to plot the figure of features with only softmax loss.
And what do you think about the different shape between the Fig.3 in original paper and the figure produced though your codes.
image
image

Question about nLLLoss and CrossEnctropyLoss

Hi, I find the code uses F.log_softmax() and nn.nLLLoss() to count the softmax entropy loss.
And I use nn.CrossEntropyLoss() instead.
But the result is very bad ——test acc is only 93%.
Could you please tell me what's wrong? Thank you!

gradient needs to be divided by batch size

It appears that gradient is not being divided by batch size in CenterlossFunc()

I change it to:

@staticmethod
def forward(ctx, feature, label, centers):
    ctx.save_for_backward(feature, label, centers)
    centers_batch = centers.index_select(0, label.long())
    return (feature - centers_batch).pow(2).sum() / 2.0 / feature.size()[0]


@staticmethod
def backward(ctx, grad_output):
    feature, label, centers = ctx.saved_tensors
    centers_batch = centers.index_select(0, label.long())
    diff = centers_batch - feature
    # init every iteration
    counts = centers.new(centers.size(0)).fill_(1)
    ones = centers.new(label.size(0)).fill_(1)
    grad_centers = centers.new(centers.size()).fill_(0)

    counts = counts.scatter_add_(0, label.long(), ones)
    grad_centers.scatter_add_(0, label.unsqueeze(1).expand(feature.size()).long(), diff)
    grad_centers = grad_centers/(counts.view(-1, 1))
    return - grad_output.data * diff / feature.size()[0], None, grad_centers

Figure now looks like:

epoch 90

Only implemented softmax

loss = nllloss(pred, target) + loss_weight * centerloss(target, ip1)
the loss is centerloss and softmaxloss,the result is correct
when i remove the centerloss, loss = nllloss(pred, target),
the result is incorrect.
image

gradient needs to be divided by batch size v2

#9
this issue has been raised before.

but I find the average operation only influence the gradient of Xi, which the gradient of centers didn't be divided by batch size.

for example:
if the gradient of Xi is Grad_Xi, and gradient of Center is Grad_center when loss = LO
if LO/10, the gradient of Xi will be Grad_Xi/10, but the gradient of Center still be Grad_center

Could you help to check it?

question about center loss item

Excuse me, the formula says that Xi is a feature dimension that changes with depth. How does the code reflect it? I can't see that. Is feat_dim set as the input dimension, and can it ensure that the optimized output dimension is the final one? Please answer my question, thank you.

about optimizer4nn & optimzer4center

hello,thanks for your code,it's very import for me to learn center loss,but i don't know the means of optimzer4center & optimizer4nn,thanks for your reply.

visualize feature center

when visualizing the feature center, why use the first and the second dimension of the feature as x and y?

image

Migration to 0.4

In PyTorch 0.4, Tensor and Variable are merged.... so we need to migrate.....

About the equation(4)

Excuse me, I note that the gradients of Lc with respect to Ci computed by equation(2) is nearly the same as equation(4). So, could we update Ci by automatic differential system as follows:

Class centerloss():
return Lc/batch_size %equation(2)

L=L1+Lc
L.backward()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.