Hi, I tried to follow README instructions for training on my own dataset but it di

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, In response to <a class="user-mention notranslate" data-hovercard-type="user"

For me this worked: set to in network.py set ignore_missing to

I think you mean line 189 which is: net.load(args.restore_from, sess) <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

README instructions not working for training on my own dataset,about hellochick/icnet-tensorflow

Comments (46)

ogail commented on July 28, 2024 8

@hellochick I finally got it working, here are steps I did:

commenting net.load line
Setting number of classes to 2
Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):

Here is original image

What I did for training is following:

Run python train.py for 8 hrs until loss reached 0.281, then stopped.
Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

from icnet-tensorflow.

BCJuan commented on July 28, 2024 7

Hi,
In response to @qmy612 (also @ogail ): you can indeed use the pretrained model.

I achieved it yesterday doing the following:

As in #20 do: Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to your number of classes. (this is from @ogail initial question)
Then go to network.py, to the load function of class Network and add the following line:if 'conv6_cls' not in var.name: before the line session.run(var.assign(data)). Also change ignore_missing to True

The function should look something like:

def load(self, data_path, session, ignore_missing=True):
        data_dict = np.load(data_path, encoding='latin1').item()
        for op_name in data_dict:
            with tf.variable_scope(op_name, reuse=True):
                for param_name, data in data_dict[op_name].items():
                    try:
                        if 'bn' in op_name:
                            param_name = BN_param_map[param_name]

                        var = tf.get_variable(param_name)
                        if 'conv6_cls' not in var.name:
                            session.run(var.assign(data))
                    except ValueError:
                        if not ignore_missing:
                            raise

Then, you can make the change stated in #20, I mean changing:

restore_var = tf.global_variables()

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]

or not.

Indeed it would have the same effect since you have not loaded the conv6_cls from the pretrained model, which is the last layer (classification) of the net.

Hope this helps.

from icnet-tensorflow.

MarcSchotman commented on July 28, 2024 3

For me this worked:

set to in network.py set ignore_missing to True:

def load(self, data_path, session, ignore_missing=True):

Edit INFER_SIZE, TRAINING_SIZE and the whole dict of others_param
In train.py change

 restore_var = tf.global_variables()

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]

python train.py --dataset others

from icnet-tensorflow.

ogail commented on July 28, 2024 2

I think you mean line 189 which is:
net.load(args.restore_from, sess)

I tried and it results in loss being ‘nan’

i also tried to load from a saved checkpoint (instead of numpy) however issue was that loss is fixed at 0.511 and these
sub4 =0.000
sub24 =0.000
sub124 =0.000
do not change at all.

Any ideas?

from icnet-tensorflow.

ogail commented on July 28, 2024 2

@BCJuan excuse me for late reply. Yes I did both changes as well.

from icnet-tensorflow.

qmy612 commented on July 28, 2024 1

@hellochick @ogail hello, my question is that, if my datasets is 2 class, I can only use this network by training from scratch? Can't use the previous layers of pre-trained mode or just train the last cls layer? Because in my experiment with caffe, I can train based on pre-trained models. I am not familiar with tf, but the deeplab_v3+_ tensorflow can also support only training the last layer.

from icnet-tensorflow.

hellochick commented on July 28, 2024

Hey @ogail,
Since by default is to load pre-trained model and keep finetuning on it. However, the pre-trained cityscapes has 19 classes, while your dataset has only 1. You can comment line 191 to solve the problem, training from scratch.

from icnet-tensorflow.

hellochick commented on July 28, 2024

Before that, I want to know what your dataset look like, can you show some examples? If there is only one class, it doesn't need to train anymore, am I right?

from icnet-tensorflow.

ogail commented on July 28, 2024

The dataset has 2 class obstacles (0) and non-obstacles (255) in a binary format. Here is an example of raw image

This is an example of label image (similar to * labelTrainIds * images in cityscapes)

Think of this as semantic segmentation with two labels (background and foreground). Hope it makes sense. FYI i set the IGNORE_LABEL to 0

from icnet-tensorflow.

hellochick commented on July 28, 2024

It make sense to me. For this case, I think it's difficult to learn to detect obstacles, since the obstacles contain several different objects. Hence, I think you can restore a pre-trained ImageNet, or ADE20k segmentation, and set the learning rate much lower to try on this task.

Btw, I have tried to do the obstacle detection before, and you can refer to Indoor Segmentation. In this project, I detect obstacles by training on ADE20k, and I compressed num_classes from 150 to 27, just for your reference.

from icnet-tensorflow.

Danzip commented on July 28, 2024

Im trying to do something similar with LFW data set http://vis-www.cs.umass.edu/lfw/part_labels/
i've set num_classes to 3 and rearranged masks so that mask is a gray scale image where 0 is hair, 1 is face and 2 is background. I also removed the net.load line on the code the error im getting is when the line loss = tf.nn.sparse_softmax_cross_entropy_with_logits is being called.
ValueError: Rank mismatch: Rank of labels (received 1) should equal rank of logits minus 1 (received 1).

Can u please explain what the function create_loss expects as input? what is the shape of output, label? when i try it i get label of shape (16,250,250,1) and output of shape (16,15,15,3)
after reshaping raw_pred is of shape=(10800,) but label is of shape=(3600,) there is a mismatch here and i suspect its why the function fails, but I cant seem to understand what to do.

from icnet-tensorflow.

bhadresh74 commented on July 28, 2024

@ogail Thank you for the information you provided.
Any change, you could make your script public? It would help us a lot.
Thank you in advance

from icnet-tensorflow.

ogail commented on July 28, 2024

@bhadresh74 is there a specific question you have?

from icnet-tensorflow.

bhadresh74 commented on July 28, 2024

@ogail Yes.
Couple of them actually.

I trained on two classes but my loss seems to be stuck at 0.6 and not going down.
Here are my HP:
batch size: 64
Steps: 60000
Others are as given in the repo.
While inference, how can I extract probability for each class. The given code returns 0 probability for each pixel for some reason. I would like to know how did you extract the softmax logits?

Thank you

from icnet-tensorflow.

ogail commented on July 28, 2024

@bhadresh74 Here are some suggestions:
1- getting loss to 0.6 is good indication, pushing it more will require some tinkering like:

increasing number of training steps
increasing batch size
checking to see if ground truth labels has some errors that are consistently failing.
2- I have not tried to extract the probability before.

from icnet-tensorflow.

BCJuan commented on July 28, 2024

Hi,
I would like to make a question for you, @ogail , since I had the same problems:
I see that you have done the following:

commenting net.load line
Setting number of classes to 2
Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)

But have you also made the changes that you stated at the beginning? Manly:

Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to 1
```
Then replaced this line in train.py
```

restore_var = tf.global_variables()

with

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in _v.name]

from icnet-tensorflow.

ogail commented on July 28, 2024

Yes, you will have to train from scratch

from icnet-tensorflow.

ogail commented on July 28, 2024

@BCJuan did fine-tuning from pretrained model boosted on your custom task? Have u tried to compare that vs training from scratch?

from icnet-tensorflow.

BCJuan commented on July 28, 2024

Yes, it boosted the results. Indeed I was not obtaining any good results without the pretrained model.

I used the icnet_cityscapes_trainval_bnomerge_90k, but I think that any other model can be used.

from icnet-tensorflow.

ogail commented on July 28, 2024

@BCJuan what's mIoU before and after using cityecapes pretrained model?

from icnet-tensorflow.

ogail commented on July 28, 2024

@BCJuan I did load the pretrained model however didn't see much diff between fine-tuning vs training from scratch.

from icnet-tensorflow.

BCJuan commented on July 28, 2024

@ogail I do not know since I am just finetunning. But with a one hour run, using the pretrained model I achieve like 20% mIoU while without it 6%. Maybe I am doing something wrong.

from icnet-tensorflow.

qmy612 commented on July 28, 2024

@BCJuan Thank you very much, I will try tomorrow.

from icnet-tensorflow.

seovchinnikov commented on July 28, 2024

I'll try finetuning too and will report the results. But from my experience finetuning always gives the boost in tendency to generalization of the model so it's nice to try

from icnet-tensorflow.

VincentGu11 commented on July 28, 2024

Hi @ogail,
Thank you very much for sharing your training steps for us.
Recently, I need to solve the same problem like you, I set my network parameter to the same like yours and the loss can become to 0.17 and keep going down.
However, when I inference my net, the result shows all the image came to 0 or 1, it seems not quite right.
Did you have this problem? Thank you!

from icnet-tensorflow.

PratibhaT commented on July 28, 2024

@ogail Have you tried to train it for multiple classes? What annotating tool I can use for training it on multiple classes? Also what is the accuracy and fps you are getting on evaluation?

from icnet-tensorflow.

ogail commented on July 28, 2024

@VincentGu11 is the 0 and/or 1 are how the final rendered image looks like? There's function decode_label that converts training index to RGB color

@PratibhaT yes I tried. you could use labelme tool. The accuracy and fps depends on the data and the problem so my numbers wont be relevant in general sense.

from icnet-tensorflow.

PratibhaT commented on July 28, 2024

@ogail I used VIA annotation tool, which gives .json file. But in this code the list.txt refers to .png image for label. Is there a way to convert a .json annotation files to .png to be used as label. What is the output of labelme tool?

from icnet-tensorflow.

adisrivasa commented on July 28, 2024

@ogail I am training it for my own dataset consisting of 8 classes. I did all the required changes mentioned above but i am still getting the following error :-

Assign requires shapes of both tensors to match. lhs shape= [8] rhs shape= [19]

Is there some particular change that i missed out?

from icnet-tensorflow.

Soulempty commented on July 28, 2024

@qmy612 ,Can you share some details about your training with caffe framework?
I have the problem to train with matcaffe downloaded .

from icnet-tensorflow.

yeyuanzheng177 commented on July 28, 2024

@ogail Thank you for the information you provided. Can I ask you two questions?
1.Did you use the ADE20k or any other pre-training model to fine-tune when training your own datasets?

2.What is the basis for setting the IGNORE_LABEL value?
Looking forward to your answer.

from icnet-tensorflow.

ogail commented on July 28, 2024

@PratibhaT I did not search for such tool however I'd just do conversion myself to get going.

@adisrivasa This seems that (1) number of classes in train.py is not set to 8 or (2) the protoxtx file for pretrained checkpoint is not updated to use 8 classes instead of 18

@yeyuanzheng177 (1) I did not use ADE20k for fine-tuning instead I used cityscapes. (2) I set this value to 255

from icnet-tensorflow.

Soulempty commented on July 28, 2024

@qmy612 ,hello,,can you give some advice on how to train ICNet on caffe,Thank you for your help

from icnet-tensorflow.

yeyuanzheng177 commented on July 28, 2024

@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.

from icnet-tensorflow.

seushengchao commented on July 28, 2024

@VincentGu11 Hello. Do you solve the problem( the result shows all the image came to 0 or 1) ?? Thank you! I have the same problem as you.

from icnet-tensorflow.

abreheret commented on July 28, 2024

I finally got it working, here are steps I did:

commenting net.load line

Setting number of classes to 2

Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):

Here is original image

What I did for training is following:

Run python train.py for 8 hrs until loss reached 0.281, then stopped.

Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

Cool, you have succeeded !

I am also learning on my own data, and I would like to know how many images annotated do you have for a satisfactory result (@ogail )?

from icnet-tensorflow.

erichhhhho commented on July 28, 2024

Sorry. I was wondering if you guys was using pretrianed icnet_cityscapes_bnnomerge.prototxt instead of icnet_cityscapes_trainval_90k_bnnomerge.npy @ogail @hellochick

So, how could I update the pretrained model by changing conv6_cls num_output from 19 to 1 @BCJuan

from icnet-tensorflow.

hellochick commented on July 28, 2024

Hey @erichhhhho,

You need to change the restore variables just like restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name], thus you can restore the pre-trained weights except the last layer.

from icnet-tensorflow.

amwfarid commented on July 28, 2024

Hey @erichhhhho,

You need to change the restore variables just like restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name], thus you can restore the pre-trained weights except the last layer.

I still get the same problem even though I set restore_var without conv6_cls (For model retraining using .npy). Am I missing something?

from icnet-tensorflow.

kangyang94 commented on July 28, 2024

@ogail

Now the project don't have inference.py and tool.py, do you still have the version you used?

from icnet-tensorflow.

amwfarid commented on July 28, 2024

@kangyang94

At least for inference.py, it actually exists as a python notebook (demo.ipynb).

from icnet-tensorflow.

prz30 commented on July 28, 2024

@hellochick I finally got it working, here are steps I did:

commenting net.load line

Setting number of classes to 2

Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):

Here is original image

What I did for training is following:

Run python train.py for 8 hrs until loss reached 0.281, then stopped.

Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

Hi @ogail
Please forgive me for disturbing you. Can you give me a copy of your code at that moment? Since the author iterative version, I found that there may be differences in some code-modified changes. Thank you! My email address is [email protected]
Finally, please forgive me for my worse English

from icnet-tensorflow.

seushengchao commented on July 28, 2024

@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.

Hello, have you solved the problem? It also troubles for a long time!

from icnet-tensorflow.

Mythos-Rudy commented on July 28, 2024

The main reason for this problem is the function create_loss() in train.py , the author ignore the background when compute loss，so that if your NUM_CLASSES is 2，and ignore one of them，loss will be very low if model always predict the pixel to another one. So your loss is reach to 0.5(because l2loss is 0.5), but the model learned nothing. To solve the problem, you have to change the code about ignore one classes

from icnet-tensorflow.

SpencerTrihus commented on July 28, 2024

For me this worked:

1. set to in network.py set ignore_missing to True:

def load(self, data_path, session, ignore_missing=True):

1. Edit `INFER_SIZE`, `TRAINING_SIZE` and the whole dict of `others_param`

2. In train.py change

 restore_var = tf.global_variables()

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]

1. run

python train.py --dataset others

I am trying to train ICNet on a custom dataset with 2 classes, the background and the object, but I received an error due to cityscapes having 19 classes while mine only includes 2. I have followed the instructions above, which seem to have solved the class problem but now during training all parameters are nan.

I do not understand what is recommended in #20 and in this thread regarding making changes to .prototxt and .npy files. If this is necessary, could you explain how to make this change? If not, what is causing the nan loss results?

Thanks!

from icnet-tensorflow.

gitunit commented on July 28, 2024

how do i change the number of classes for the pre-trained model? i've found the .prototxt file from the original work but i don't know what to do with it, where to load it.

from icnet-tensorflow.

README instructions not working for training on my own dataset about icnet-tensorflow HOT 46 OPEN

Comments (46)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent