bamos / densenet.pytorch Goto Github PK

View Code? Open in Web Editor NEW

821.0 821.0 188.0 3.06 MB

A PyTorch implementation of DenseNet.

License: Apache License 2.0

Python 100.00%

deep-learning densenet pytorch

densenet.pytorch's People

Stargazers

Watchers

Forkers

benjamesbabala bigsnarfdude soprof keskarnitish codeaudit walkacross allensmile johnsonc xypan1232 hedgefair clcarwin statml fujun-liu fashtimedotcom pk-codebox-evo satya758 acgtyrant varun-suresh superresolution snci linranran afelio2 andreaazzini runngezhang mchong6 dongb5 cshaoping soumenms2015 yingzha b2220333 ieee820 nutszebra sunalbert grseb9s monjovi zizhengtai neuralnetworkingtechnologies liu3xing3long kaitoops zhaofenqiang 94mia rlizzo ahartikainen taihulight jackshaw princesston youngbaby123 worksking shubhampachori12110095 milkcat0904 zhixinshu hardik2396 shuizhilinxin danilecug ai3dvision jgraving xxlxsyhl wanghan0501 negative09 cwell fgbelidji wangkangnian niyangfan sunkevin1214 shangdp thoringondor philippslang afcarl qwertychuen lyken17 wgqtmac riverleegithub xdcesc zhe-meng pickleyang jichenzhao39 windowxiaoming zhaochunyu323 o7s8r6 jl2922 shitian98 hyzsj dovedx bravotty akumar14 mehdidc amwons clvcooke csyhhu qianwangn pkujcy linhduongtuan tangzixia emolli jizhihang hanbumko missyangx jhmlam yuchaozheng evorigin

densenet.pytorch's Issues

Help needed on reproducing the performance on Cifar-100

I used the default setting(which I think is Densenet-12-BC with data augmentation) on cifar-100(via just changing the name of dataset class and the nClasses variable). The training curve looks like this:

Though the training has not ended yet, from training curves for other networks on Cifar-100 I can tell there would be no more major changes in acc. The highest of acc for now is 75.59%, which can only match the reported performance of Densenet-12(depth 40) with data augmentation.
Has any one tested this repo on Cifar-100 yet?

How did you create the header.png?

I'm quite impressed with how you've presented your densenet implementation.

V-Net as a bit messier in terms of needing substantial preprocessing of the data set, a custom loader, and a custom loss function. Nonetheless, I'm patterning the presentation of my implementation https://github.com/mattmacy/vnet.pytorch after yours and I'm wondering how you created the header.png image.

Thanks in advance..

Error when loading the model saved

Hi,

I modified your code to train a model with my own dataset, and I am trying to load the model saved as "latest.pth" to do some tests. However, I am getting this error:

AttributeError: 'DenseNet' object has no attribute 'copy'

The code I use to load the model is:

net.load_state_dict(torch.load(checkpoint, map_location=lambda storage, loc: storage))

where checkpoint is the path to "latest.pth"

Any help would be appreciated.

Thanks

Multi-GPU implementation

Hi author

Thanks for sharing your code. I notice in README you said "Multi-gpu help wanted". If you are indicating data parallelism, then it can be implemented in several lines in pytorch using nn.DataParallel.

In your train.py line 82, simply modify code

    if args.cuda:
        net = net.cuda()

    if args.cuda:
        net = net.cuda()
        net = nn.DataParallel(net, devices=[0,1,2,3])

can make whole model parallel.

Cat vs Dog

I have slightly modified your algorithm and have adapted it for two classes (k = 12, reduction = 0.5, bottleneck = True). When I train it on Cat and Dog images from CIFAR-10, I only go as high as 82% validation accuracy. Is that what you get as well? Or you get something closer to accuracy for all 10 classes, i.e. > %95?

A problem when computes cifar10 mean and stdev

Hi, I found a problem in compute-cifar10-mean.py#L31.

The shape of torchvision.datasets.CIFAR10.train_data should be (50000, 32, 32, 3) instead of (50000, 3, 32, 32), so the code in line#31 should be pixels = data[:,:,:,i].ravel()

Question - What is the purpose of this piece of Code in densenet.py?

Learning CNN architectures. Can you please tell what this piece of code does and why we are doing this? I could not relate it with the paper.

for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.bias.data.zero_()

Tiramisu DenseNet

Hey Brandon,

Do you have an implementation of FCN DenseNet https://arxiv.org/pdf/1611.09326.pdf paper?

Thanks!

Add bug solution/fix to bug discussion page

Hey,

I think it would be good to include how your CIFAR-10 convergence problem was solved. At the moment the discussion page just includes problem details.

Good to hear you got it working.

There is a size mismatch due to this lines

densenet.pytorch/densenet.py

Lines 113 to 114 in d1cd5e1

 out = self.dense3(out) 

 out = torch.squeeze(F.avg_pool2d(F.relu(self.bn1(out)), 8))

I solve the issue by changing the lines of codes to

out = self.dense3(out) out = self.relu(self.bn1(out)) out = F.avg_pool2d(out, 8) out = out.view(-1, self.nChannels)

where self.relu has been initialised as self.relu = nn.ReLU(inplace=True)

Why is there a PID in device 0 while I set called all cuda(1)?

See

I have changed the train.py, you can find that I call cuda(1) at all. But why is there the same PID in the device 0??? Am I missing something?

#!/usr/bin/env python3

import argparse
import os
import setproctitle
import shutil

import densenet
import torch
from torch import optim
from torch.autograd import Variable
from torch.nn import functional as F
from torch.utils.data import DataLoader
import torchvision
from torchvision import transforms


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--batchSz', type=int, default=64)
    parser.add_argument('--nEpochs', type=int, default=300)
    parser.add_argument('--no-cuda', action='store_false')
    parser.add_argument('--save')
    parser.add_argument('--seed', type=int, default=1)
    parser.add_argument(
            '--opt', type=str, default='sgd',
            choices=('sgd', 'adam', 'rmsprop'))
    args = parser.parse_args()

    args.cuda = args.no_cuda and torch.cuda.is_available()
    if args.cuda:
        torch.cuda.manual_seed(args.seed)

    args.save = args.save or 'work/densenet.base'
    setproctitle.setproctitle(args.save)
    if os.path.exists(args.save):
        shutil.rmtree(args.save)
    os.makedirs(args.save, exist_ok=True)

    torch.manual_seed(args.seed)

    normMean = [0.49139968, 0.48215827, 0.44653124]
    normStd = [0.24703233, 0.24348505, 0.26158768]
    normTransform = transforms.Normalize(normMean, normStd)
    trainTransform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normTransform
    ])
    testTransform = transforms.Compose([
            transforms.ToTensor(),
            normTransform
    ])

    kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}
    trainLoader = DataLoader(
            torchvision.datasets.CIFAR10(
                    root='cifar',
                    train=True,
                    download=True,
                    transform=trainTransform),
            batch_size=args.batchSz, shuffle=True, **kwargs)
    testLoader = DataLoader(
            torchvision.datasets.CIFAR10(
                    root='cifar',
                    train=False,
                    download=True,
                    transform=testTransform),
            batch_size=args.batchSz, shuffle=False, **kwargs)

    net = densenet.DenseNet(
            growthRate=12,
            depth=100,
            reduction=0.5,
            bottleneck=True,
            nClasses=10)

    print('  + Number of params: {}'.format(
            sum([p.data.nelement() for p in net.parameters()])))
    if args.cuda:
        net = net.cuda(1)

    if args.opt == 'sgd':
        optimizer = optim.SGD(
                net.parameters(), lr=1e-1, momentum=0.9, weight_decay=1e-4)
    elif args.opt == 'adam':
        optimizer = optim.Adam(net.parameters(), weight_decay=1e-4)
    elif args.opt == 'rmsprop':
        optimizer = optim.RMSprop(net.parameters(), weight_decay=1e-4)

    trainF = open(os.path.join(args.save, 'train.csv'), 'w')
    testF = open(os.path.join(args.save, 'test.csv'), 'w')

    for epoch in range(1, args.nEpochs + 1):
        adjust_opt(args.opt, optimizer, epoch)
        train(args, epoch, net, trainLoader, optimizer, trainF)
        test(args, epoch, net, testLoader, optimizer, testF)
        torch.save(net, os.path.join(args.save, 'latest.pth'))
        os.system('./plot.py {} &'.format(args.save))

    trainF.close()
    testF.close()


def train(args, epoch, net, trainLoader, optimizer, trainF):
    net.train()
    nProcessed = 0
    nTrain = len(trainLoader.dataset)
    for batch_idx, (data, target) in enumerate(trainLoader):
        if args.cuda:
            data, target = data.cuda(1), target.cuda(1)
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = net(data)
        loss = F.nll_loss(output, target)
        # make_graph.save('/tmp/t.dot', loss.creator); assert(False)
        loss.backward()
        optimizer.step()
        nProcessed += len(data)
        pred = output.data.max(1)[1]
        # get the index of the max log-probability
        incorrect = pred.ne(target.data).cpu().sum()
        err = 100.0 * incorrect / len(data)
        partialEpoch = epoch + batch_idx / len(trainLoader) - 1
        print(
                'Train Epoch: {:.2f} [{}/{}\t'
                '({:.0f}%)]\n'
                'Loss: {:.6f}\t' 'Error: {:.6f}'.format(
                        partialEpoch, nProcessed, nTrain,
                        100. * batch_idx / len(trainLoader),
                        loss.data[0], err))
        trainF.write('{},{},{}\n'.format(partialEpoch, loss.data[0], err))
        trainF.flush()


def test(args, epoch, net, testLoader, optimizer, testF):
    net.eval()
    test_loss = 0
    incorrect = 0
    for data, target in testLoader:
        if args.cuda:
            data, target = data.cuda(1), target.cuda(1)
        data, target = Variable(data, volatile=True), Variable(target)
        output = net(data)
        test_loss += F.nll_loss(output, target).data[0]
        pred = output.data.max(1)[1]
        # get the index of the max log-probability
        incorrect += pred.ne(target.data).cpu().sum()
    test_loss = test_loss
    test_loss /= len(testLoader)
    # loss function already averages over batch size
    nTotal = len(testLoader.dataset)
    err = 100.0 * incorrect / nTotal
    print()
    print(
            'Test set: Average loss: {:.4f}\n'
            'Error: {}/{} ({:.0f}%)\n'.format(
                    test_loss,
                    incorrect, nTotal, err))

    testF.write('{},{},{}\n'.format(epoch, test_loss, err))
    testF.flush()


def adjust_opt(optAlg, optimizer, epoch):
    if optAlg == 'sgd':
        if epoch < 150:
            lr = 1e-1
        elif epoch == 150:
            lr = 1e-2
        elif epoch == 225:
            lr = 1e-3
        else:
            return

        for param_group in optimizer.param_groups:
            param_group['lr'] = lr

if __name__ == '__main__':
    main()

RuntimeError: size mismatch, m1: [2394 x 7], m2: [16758 x 2] at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/TH/generic/THTensorMath.cpp:2070

when I fine-tuning my dataset(96963),error :

RuntimeError: size mismatch, m1: [2394 x 7], m2: [16758 x 2] at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/TH/generic/THTensorMath.cpp:2070

issues running this implementation on google colab

some codes do not run on google colab

Is the DenseBlock Implementation correct?

Looking at your DenseBlock implementation, I don't see how the activations of the earlier layers before the previous one are being propagated to the later layers. Is the implementation really the same as in the DenseNet paper?

Have it been tested on pytorch v0.2 and CIFAR100?

How do you convert the target from numpy arrays to Tensor?

I find that torchvision get target as numpy arrays.

But you call its cuda() suddenly! I think it is the Tensor already.

I read the source code again and again, I do not find any place that you have converted the target from numpy arrays to Tensor. Any idea? Thank you!

Have you test densenet on ImageNet?

Have you test densenet on ImageNet? Please tell me the accuracy of it and release a best checkpoint of it if you had test it.

What does `rolling` use in `plot.py`?

It seems cut off trainI and occur error in plot.py.

Is the `_make_dense` function make the dense block really?

See https://github.com/bamos/densenet.pytorch/blob/master/densenet.py#L107

I think it just make a line, not a complete graph...

How can it be ran on Cifar-100?

I changed the dataset class name in the code from CIFAR10 to CIFAR100 but got serveral error during loss.backward(), like CUDNN_STATUS_MAPPING_ERROR or cublas runtime error the gpu program failed to execute. So I guess there must be something specified to CIFAR10 in this code. But I can't find it.

	out = self.dense3(out)
	out = torch.squeeze(F.avg_pool2d(F.relu(self.bn1(out)), 8))

bamos / densenet.pytorch Goto Github PK

densenet.pytorch's People

Stargazers

Watchers

Forkers

densenet.pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org