Hi, Your team mentioned the significance of setting batch size in tr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

@clairematrix what is the image size which you are using? <a class="user-mention n

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

difficulty in reproducing your result about enet-training HOT 17 CLOSED

chuanzihe commented on August 18, 2024

difficulty in reproducing your result

from enet-training.

Comments (17)

xmba15 commented on August 18, 2024

Hi, have you made any improvements now?
I am trying to implement ENet on top of caffe too and test on City Scape dataset whose images are resized to the size of 512x512. I cannot get the accuracy proposed in the paper. Any suggestions on weight initialization when implementing the network on Caffe?

from enet-training.

chuanzihe commented on August 18, 2024

@Bajsk Hi, I have not tried it on Caffe. BTW how did you convert the model to Caffe model? How did you handle the parameters in BN layers?

from enet-training.

codeAC29 commented on August 18, 2024

@clairematrix what is the image size which you are using?
@Bajsk Sorry I won't be able to help you in Caffe. Btw we used image size of 256x512 and you cannot use a resolution of 512x512.

from enet-training.

chuanzihe commented on August 18, 2024

@codeAC29 Hi, I loaded the original Cityscape image: 2048 × 1024. When training, I followed your README.MD.
| Cityscapes | 512x256 | 64x32 | 19 |

from enet-training.

xmba15 commented on August 18, 2024

@clairematrix I tried to implement it using layers on caffe. I used BN layer on caffe without a Scale layer and set eps to 1e-3 as in Torch implementation.
@codeAC29 Can you tell me why we cannot use this network for resolution of 512x512? Actually, I resized the original images and labelled training data to 512x512 and tried to construct the network as described in your paper. You described an example with the input of 512x512 in Table 1 in the paper, right?

from enet-training.

codeAC29 commented on August 18, 2024

@clairematrix The result which you have posted does not look like a 512x256 image. Ideally you should use the same resolution for both training and testing.
@Bajsk The original image is of the size 1024x2048. If you rescale it to 512x512 then you are changing the aspect ratio. In this case the network will be learning, for example a squeezed version of car or person, which you do not want to.

from enet-training.

ramonss commented on August 18, 2024

Could someone share the full list of options settings that were used to achieve the results for CityScapes in the paper? I'm also having some difficulties to reproduce the result. Currently, I'm using 4x Titan X GPUs with the following command:

--learningRate 5e-4 --weightDecay 2e-4 --batchSize 10 --nGPU 4 --maxepoch 1000

I tried to follow the suggestion from the paper in terms of parameters for learning rate and weight decay. But how about the total number of epochs or other settings? I'm getting results similar to the following screenshot. In general it shows problems for detecting the road near the ego vehicle. Is that expected?

Do I need to use some special parameters to visualize the results? This is the command I used:

qlua demo.lua -d train/trained_models/ --net 320 -i train/data/Cityscapes/leftImg8bit/test/bonn/ -m cityscapes

The net number used above is the best epoch from the decoder training so far after 350 epochs of decoder training and 1000 epochs of encoder training.

I appreciate any help in advance.

from enet-training.

chuanzihe commented on August 18, 2024

@codeAC29 Thanks. I do use the same resolution for both training and testing. The image I post was rescaled with -r for visualization , as you suggested in issue #4.

@ramonss Thank you for sharing your options. It was helpful. I was wondering whether this has anything to do with GPU amount until I saw yours. It is said some group has successfully reproduced it in MXNET, even with a batchsize = 1. But what could be the significant differences when we are training ENet based on different platforms?

from enet-training.

chuanzihe commented on August 18, 2024

@codeAC29 It seems the results trained by @ramonss and me have the same problem in the lower part of image. Looks like the training was incomplete, which is contradict to the converged testing and training error. Any advice? Thank you again for answering so many of my issues. (because your work is truly brilliant!)

from enet-training.

codeAC29 commented on August 18, 2024

The default option set in opts.lua were the options used by us. We never went beyond 300 epochs and generally network converged for us within ~250 epochs. Moreover, unlike the result which you are getting in the above image, detection of ego vehicle is crisp for us as you can see in the results provided in the paper.

from enet-training.

ramonss commented on August 18, 2024

@codeAC29 Thanks for the answer. I tried again with the default options as follows below, but still no luck. The error does not go lower than 0.37 for encoder after 200 epochs and I have the same problem for the decoder part (errors are even higher due to accumulated errors from encoder).

th run.lua --dataset cs --datapath data/Cityscapes --model models/encoder.lua --save trained_models/encoder/ --cachepath /home/caduser/ENet-training/train/dataset_cache/encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64

th run.lua --dataset cs --datapath data/Cityscapes --model models/decoder.lua --save trained_models/decoder/ --CNNEncoder trained_models/encoder/model-best.net --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --cachepath /home/caduser/ENet-training/train/dataset_cache/decoder/

In addition to the training problem of not converging, also when I try to use the trained model as you made available in https://www.dropbox.com/sh/dywzk3gyb12hpe5/AAD5YkUa8XgMpHs2gCRgmCVCa, I get the same problem in the visualization near the ego vehicle for Cityscapes. Perhaps I'm using the wrong command line for visualization? Could you please let me know if the following is correct?

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1

(where github_sample is the name of the folder where the trained model is located and 1 is the model number after I renamed model-best.net to model-1.net; that was the way I found to be able to run it without parameters errors in demo.lua)

I appreciate your help in advance.

from enet-training.

codeAC29 commented on August 18, 2024

Try the following two commands and let me know what do you get for them:

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1 -r 0.5

and

qlua demo.lua -d ../train/trained_models/ -i ../train/data/Cityscapes/leftImg8bit/test/bonn/ -m github_sample -v --net 1 -r 0.25

from enet-training.

ramonss commented on August 18, 2024

@codeAC29 Thanks for the suggestion. It worked ;-)

I can also confirm that even with high error values in the training (i.e. around 0.6 for decoder after 300 epochs), I can now see better results with the visualizer when resizing with 0.5 or 0.25. I guess that is because we are training with a resized version (512 x 256) of original Cityscapes images, isn't it?

Thanks again for your help. Really appreciated.

from enet-training.

codeAC29 commented on August 18, 2024

That is correct @ramonss . Since training was done on smaller resolution, it makes more sense (at least to me) to visualize it on smaller resolution.

from enet-training.

apaszke commented on August 18, 2024

In fact, some of the default values in the repo didn't match the ones we reported in the paper (and the ones I've used to train on cityscapes). We used an lr of 5e-4 and wd of 2e-4. Also we trained on 1024x512 images on Cityscapes. You might want to check these settings. Sorry for the trouble!

from enet-training.

adroit91 commented on August 18, 2024

Hi @apaszke

Any recommendations on different lr, wd, image dimensions to achieve same results as yours in the paper using this same code for the camVid dataset? We have tried light changes, but cannot go below 0.54 error, and even that converges around 35-45 epochs for encoder.

from enet-training.

TimoSaemann commented on August 18, 2024

@clairehe @Bajsk If you are still interested in a Caffe implementation, you may find this ENet repository helpful.

from enet-training.

difficulty in reproducing your result about enet-training HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent