Giter VIP home page Giter VIP logo

Comments (17)

xmba15 avatar xmba15 commented on August 18, 2024

Hi, have you made any improvements now?
I am trying to implement ENet on top of caffe too and test on City Scape dataset whose images are resized to the size of 512x512. I cannot get the accuracy proposed in the paper. Any suggestions on weight initialization when implementing the network on Caffe?

from enet-training.

chuanzihe avatar chuanzihe commented on August 18, 2024

@Bajsk Hi, I have not tried it on Caffe. BTW how did you convert the model to Caffe model? How did you handle the parameters in BN layers?

from enet-training.

codeAC29 avatar codeAC29 commented on August 18, 2024

@clairematrix what is the image size which you are using?
@Bajsk Sorry I won't be able to help you in Caffe. Btw we used image size of 256x512 and you cannot use a resolution of 512x512.

from enet-training.

chuanzihe avatar chuanzihe commented on August 18, 2024

@codeAC29 Hi, I loaded the original Cityscape image: 2048 × 1024. When training, I followed your README.MD.
| Cityscapes | 512x256 | 64x32 | 19 |

from enet-training.

xmba15 avatar xmba15 commented on August 18, 2024

@clairematrix I tried to implement it using layers on caffe. I used BN layer on caffe without a Scale layer and set eps to 1e-3 as in Torch implementation.
@codeAC29 Can you tell me why we cannot use this network for resolution of 512x512? Actually, I resized the original images and labelled training data to 512x512 and tried to construct the network as described in your paper. You described an example with the input of 512x512 in Table 1 in the paper, right?

from enet-training.

codeAC29 avatar codeAC29 commented on August 18, 2024

@clairematrix The result which you have posted does not look like a 512x256 image. Ideally you should use the same resolution for both training and testing.
@Bajsk The original image is of the size 1024x2048. If you rescale it to 512x512 then you are changing the aspect ratio. In this case the network will be learning, for example a squeezed version of car or person, which you do not want to.

from enet-training.

ramonss avatar ramonss commented on August 18, 2024

Could someone share the full list of options settings that were used to achieve the results for CityScapes in the paper? I'm also having some difficulties to reproduce the result. Currently, I'm using 4x Titan X GPUs with the following command:

--learningRate 5e-4 --weightDecay 2e-4 --batchSize 10 --nGPU 4 --maxepoch 1000

I tried to follow the suggestion from the paper in terms of parameters for learning rate and weight decay. But how about the total number of epochs or other settings? I'm getting results similar to the following screenshot. In general it shows problems for detecting the road near the ego vehicle. Is that expected?

image

Do I need to use some special parameters to visualize the results? This is the command I used:

qlua demo.lua -d train/trained_models/ --net 320 -i train/data/Cityscapes/leftImg8bit/test/bonn/ -m cityscapes

The net number used above is the best epoch from the decoder training so far after 350 epochs of decoder training and 1000 epochs of encoder training.

I appreciate any help in advance.

from enet-training.

chuanzihe avatar chuanzihe commented on August 18, 2024

@codeAC29 Thanks. I do use the same resolution for both training and testing. The image I post was rescaled with -r for visualization , as you suggested in issue #4.

@ramonss Thank you for sharing your options. It was helpful. I was wondering whether this has anything to do with GPU amount until I saw yours. It is said some group has successfully reproduced it in MXNET, even with a batchsize = 1. But what could be the significant differences when we are training ENet based on different platforms?

from enet-training.

chuanzihe avatar chuanzihe commented on August 18, 2024

@codeAC29 It seems the results trained by @ramonss and me have the same problem in the lower part of image. Looks like the training was incomplete, which is contradict to the converged testing and training error. Any advice? Thank you again for answering so many of my issues. (because your work is truly brilliant!)

image

from enet-training.

codeAC29 avatar codeAC29 commented on August 18, 2024

The default option set in opts.lua were the options used by us. We never went beyond 300 epochs and generally network converged for us within ~250 epochs. Moreover, unlike the result which you are getting in the above image, detection of ego vehicle is crisp for us as you can see in the results provided in the paper.

from enet-training.

ramonss avatar ramonss commented on August 18, 2024

@codeAC29 Thanks for the answer. I tried again with the default options as follows below, but still no luck. The error does not go lower than 0.37 for encoder after 200 epochs and I have the same problem for the decoder part (errors are even higher due to accumulated errors from encoder).

th run.lua --dataset cs --datapath data/Cityscapes --model models/encoder.lua --save trained_models/encoder/ --cachepath /home/caduser/ENet-training/train/dataset_cache/encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64

th run.lua --dataset cs --datapath data/Cityscapes --model models/decoder.lua --save trained_models/decoder/ --CNNEncoder trained_models/encoder/model-best.net --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --cachepath /home/caduser/ENet-training/train/dataset_cache/decoder/

In addition to the training problem of not converging, also when I try to use the trained model as you made available in https://www.dropbox.com/sh/dywzk3gyb12hpe5/AAD5YkUa8XgMpHs2gCRgmCVCa, I get the same problem in the visualization near the ego vehicle for Cityscapes. Perhaps I'm using the wrong command line for visualization? Could you please let me know if the following is correct?

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1

(where github_sample is the name of the folder where the trained model is located and 1 is the model number after I renamed model-best.net to model-1.net; that was the way I found to be able to run it without parameters errors in demo.lua)

I appreciate your help in advance.

from enet-training.

codeAC29 avatar codeAC29 commented on August 18, 2024

Try the following two commands and let me know what do you get for them:

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1 -r 0.5

and

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1 -r 0.25

from enet-training.

ramonss avatar ramonss commented on August 18, 2024

@codeAC29 Thanks for the suggestion. It worked ;-)

I can also confirm that even with high error values in the training (i.e. around 0.6 for decoder after 300 epochs), I can now see better results with the visualizer when resizing with 0.5 or 0.25. I guess that is because we are training with a resized version (512 x 256) of original Cityscapes images, isn't it?

Thanks again for your help. Really appreciated.

from enet-training.

codeAC29 avatar codeAC29 commented on August 18, 2024

That is correct @ramonss . Since training was done on smaller resolution, it makes more sense (at least to me) to visualize it on smaller resolution.

from enet-training.

apaszke avatar apaszke commented on August 18, 2024

In fact, some of the default values in the repo didn't match the ones we reported in the paper (and the ones I've used to train on cityscapes). We used an lr of 5e-4 and wd of 2e-4. Also we trained on 1024x512 images on Cityscapes. You might want to check these settings. Sorry for the trouble!

from enet-training.

adroit91 avatar adroit91 commented on August 18, 2024

Hi @apaszke

Any recommendations on different lr, wd, image dimensions to achieve same results as yours in the paper using this same code for the camVid dataset? We have tried light changes, but cannot go below 0.54 error, and even that converges around 35-45 epochs for encoder.

from enet-training.

TimoSaemann avatar TimoSaemann commented on August 18, 2024

@clairehe @Bajsk If you are still interested in a Caffe implementation, you may find this ENet repository helpful.

from enet-training.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.