Giter VIP home page Giter VIP logo

Comments (46)

ogail avatar ogail commented on July 28, 2024 8

@hellochick I finally got it working, here are steps I did:

  • commenting net.load line
  • Setting number of classes to 2
  • Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
    Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
    img_00197
    Here is original image
    img_00197

What I did for training is following:

  • Run python train.py for 8 hrs until loss reached 0.281, then stopped.
  • Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

from icnet-tensorflow.

BCJuan avatar BCJuan commented on July 28, 2024 7

Hi,
In response to @qmy612 (also @ogail ): you can indeed use the pretrained model.

I achieved it yesterday doing the following:

  • As in #20 do: Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to your number of classes. (this is from @ogail initial question)
  • Then go to network.py, to the load function of class Network and add the following line:if 'conv6_cls' not in var.name: before the line session.run(var.assign(data)). Also change ignore_missing to True

The function should look something like:

def load(self, data_path, session, ignore_missing=True):
        data_dict = np.load(data_path, encoding='latin1').item()
        for op_name in data_dict:
            with tf.variable_scope(op_name, reuse=True):
                for param_name, data in data_dict[op_name].items():
                    try:
                        if 'bn' in op_name:
                            param_name = BN_param_map[param_name]

                        var = tf.get_variable(param_name)
                        if 'conv6_cls' not in var.name:
                            session.run(var.assign(data))
                    except ValueError:
                        if not ignore_missing:
                            raise

Then, you can make the change stated in #20, I mean changing:

restore_var = tf.global_variables()

by

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]

or not.

Indeed it would have the same effect since you have not loaded the conv6_cls from the pretrained model, which is the last layer (classification) of the net.

Hope this helps.

from icnet-tensorflow.

MarcSchotman avatar MarcSchotman commented on July 28, 2024 3

For me this worked:

  1. set to in network.py set ignore_missing to True:
def load(self, data_path, session, ignore_missing=True):
  1. Edit INFER_SIZE, TRAINING_SIZE and the whole dict of others_param

  2. In train.py change

 restore_var = tf.global_variables()

to

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
  1. run
python train.py --dataset others

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024 2
  • I think you mean line 189 which is:
    net.load(args.restore_from, sess)

I tried and it results in loss being ‘nan’

  • i also tried to load from a saved checkpoint (instead of numpy) however issue was that loss is fixed at 0.511 and these
    sub4 =0.000
    sub24 =0.000
    sub124 =0.000
    do not change at all.

Any ideas?

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024 2

@BCJuan excuse me for late reply. Yes I did both changes as well.

from icnet-tensorflow.

qmy612 avatar qmy612 commented on July 28, 2024 1

@hellochick @ogail hello, my question is that, if my datasets is 2 class, I can only use this network by training from scratch? Can't use the previous layers of pre-trained mode or just train the last cls layer? Because in my experiment with caffe, I can train based on pre-trained models. I am not familiar with tf, but the deeplab_v3+_ tensorflow can also support only training the last layer.

from icnet-tensorflow.

hellochick avatar hellochick commented on July 28, 2024

Hey @ogail,
Since by default is to load pre-trained model and keep finetuning on it. However, the pre-trained cityscapes has 19 classes, while your dataset has only 1. You can comment line 191 to solve the problem, training from scratch.

from icnet-tensorflow.

hellochick avatar hellochick commented on July 28, 2024

Before that, I want to know what your dataset look like, can you show some examples? If there is only one class, it doesn't need to train anymore, am I right?

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

The dataset has 2 class obstacles (0) and non-obstacles (255) in a binary format. Here is an example of raw image
img_00002
This is an example of label image (similar to * labelTrainIds * images in cityscapes)
img_00002

Think of this as semantic segmentation with two labels (background and foreground). Hope it makes sense. FYI i set the IGNORE_LABEL to 0

from icnet-tensorflow.

hellochick avatar hellochick commented on July 28, 2024

It make sense to me. For this case, I think it's difficult to learn to detect obstacles, since the obstacles contain several different objects. Hence, I think you can restore a pre-trained ImageNet, or ADE20k segmentation, and set the learning rate much lower to try on this task.

Btw, I have tried to do the obstacle detection before, and you can refer to Indoor Segmentation. In this project, I detect obstacles by training on ADE20k, and I compressed num_classes from 150 to 27, just for your reference.

from icnet-tensorflow.

Danzip avatar Danzip commented on July 28, 2024

Im trying to do something similar with LFW data set http://vis-www.cs.umass.edu/lfw/part_labels/
i've set num_classes to 3 and rearranged masks so that mask is a gray scale image where 0 is hair, 1 is face and 2 is background. I also removed the net.load line on the code the error im getting is when the line loss = tf.nn.sparse_softmax_cross_entropy_with_logits is being called.
ValueError: Rank mismatch: Rank of labels (received 1) should equal rank of logits minus 1 (received 1).

Can u please explain what the function create_loss expects as input? what is the shape of output, label? when i try it i get label of shape (16,250,250,1) and output of shape (16,15,15,3)
after reshaping raw_pred is of shape=(10800,) but label is of shape=(3600,) there is a mismatch here and i suspect its why the function fails, but I cant seem to understand what to do.

from icnet-tensorflow.

bhadresh74 avatar bhadresh74 commented on July 28, 2024

@ogail Thank you for the information you provided.
Any change, you could make your script public? It would help us a lot.
Thank you in advance

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@bhadresh74 is there a specific question you have?

from icnet-tensorflow.

bhadresh74 avatar bhadresh74 commented on July 28, 2024

@ogail Yes.
Couple of them actually.

  1. I trained on two classes but my loss seems to be stuck at 0.6 and not going down.
    Here are my HP:
    batch size: 64
    Steps: 60000
    Others are as given in the repo.

  2. While inference, how can I extract probability for each class. The given code returns 0 probability for each pixel for some reason. I would like to know how did you extract the softmax logits?

Thank you

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@bhadresh74 Here are some suggestions:
1- getting loss to 0.6 is good indication, pushing it more will require some tinkering like:

  • increasing number of training steps
  • increasing batch size
  • checking to see if ground truth labels has some errors that are consistently failing.
    2- I have not tried to extract the probability before.

from icnet-tensorflow.

BCJuan avatar BCJuan commented on July 28, 2024

Hi,
I would like to make a question for you, @ogail , since I had the same problems:
I see that you have done the following:

  • commenting net.load line
  • Setting number of classes to 2
  • Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)

But have you also made the changes that you stated at the beginning? Manly:

  • Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to 1
  • Then replaced this line in train.py
    

restore_var = tf.global_variables()

with

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in _v.name]

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

Yes, you will have to train from scratch

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@BCJuan did fine-tuning from pretrained model boosted on your custom task? Have u tried to compare that vs training from scratch?

from icnet-tensorflow.

BCJuan avatar BCJuan commented on July 28, 2024

Yes, it boosted the results. Indeed I was not obtaining any good results without the pretrained model.

I used the icnet_cityscapes_trainval_bnomerge_90k, but I think that any other model can be used.

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@BCJuan what's mIoU before and after using cityecapes pretrained model?

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@BCJuan I did load the pretrained model however didn't see much diff between fine-tuning vs training from scratch.

from icnet-tensorflow.

BCJuan avatar BCJuan commented on July 28, 2024

@ogail I do not know since I am just finetunning. But with a one hour run, using the pretrained model I achieve like 20% mIoU while without it 6%. Maybe I am doing something wrong.

from icnet-tensorflow.

qmy612 avatar qmy612 commented on July 28, 2024

@BCJuan Thank you very much, I will try tomorrow.

from icnet-tensorflow.

seovchinnikov avatar seovchinnikov commented on July 28, 2024

I'll try finetuning too and will report the results. But from my experience finetuning always gives the boost in tendency to generalization of the model so it's nice to try

from icnet-tensorflow.

VincentGu11 avatar VincentGu11 commented on July 28, 2024

Hi @ogail,
Thank you very much for sharing your training steps for us.
Recently, I need to solve the same problem like you, I set my network parameter to the same like yours and the loss can become to 0.17 and keep going down.
However, when I inference my net, the result shows all the image came to 0 or 1, it seems not quite right.
Did you have this problem? Thank you!

from icnet-tensorflow.

PratibhaT avatar PratibhaT commented on July 28, 2024

@ogail Have you tried to train it for multiple classes? What annotating tool I can use for training it on multiple classes? Also what is the accuracy and fps you are getting on evaluation?

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@VincentGu11 is the 0 and/or 1 are how the final rendered image looks like? There's function decode_label that converts training index to RGB color

@PratibhaT yes I tried. you could use labelme tool. The accuracy and fps depends on the data and the problem so my numbers wont be relevant in general sense.

from icnet-tensorflow.

PratibhaT avatar PratibhaT commented on July 28, 2024

@ogail I used VIA annotation tool, which gives .json file. But in this code the list.txt refers to .png image for label. Is there a way to convert a .json annotation files to .png to be used as label. What is the output of labelme tool?

from icnet-tensorflow.

adisrivasa avatar adisrivasa commented on July 28, 2024

@ogail I am training it for my own dataset consisting of 8 classes. I did all the required changes mentioned above but i am still getting the following error :-

Assign requires shapes of both tensors to match. lhs shape= [8] rhs shape= [19]

Is there some particular change that i missed out?

from icnet-tensorflow.

Soulempty avatar Soulempty commented on July 28, 2024

@qmy612 ,Can you share some details about your training with caffe framework?
I have the problem to train with matcaffe downloaded .

from icnet-tensorflow.

yeyuanzheng177 avatar yeyuanzheng177 commented on July 28, 2024

@ogail Thank you for the information you provided. Can I ask you two questions?
1.Did you use the ADE20k or any other pre-training model to fine-tune when training your own datasets?

2.What is the basis for setting the IGNORE_LABEL value?
Looking forward to your answer.

from icnet-tensorflow.

ogail avatar ogail commented on July 28, 2024

@PratibhaT I did not search for such tool however I'd just do conversion myself to get going.

@adisrivasa This seems that (1) number of classes in train.py is not set to 8 or (2) the protoxtx file for pretrained checkpoint is not updated to use 8 classes instead of 18

@yeyuanzheng177 (1) I did not use ADE20k for fine-tuning instead I used cityscapes. (2) I set this value to 255

from icnet-tensorflow.

Soulempty avatar Soulempty commented on July 28, 2024

@qmy612 ,hello,,can you give some advice on how to train ICNet on caffe,Thank you for your help

from icnet-tensorflow.

yeyuanzheng177 avatar yeyuanzheng177 commented on July 28, 2024

@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.

from icnet-tensorflow.

seushengchao avatar seushengchao commented on July 28, 2024

@VincentGu11 Hello. Do you solve the problem( the result shows all the image came to 0 or 1) ?? Thank you! I have the same problem as you.

from icnet-tensorflow.

abreheret avatar abreheret commented on July 28, 2024

I finally got it working, here are steps I did:

  • commenting net.load line
  • Setting number of classes to 2
  • Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
    Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
    img_00197
    Here is original image
    img_00197

What I did for training is following:

  • Run python train.py for 8 hrs until loss reached 0.281, then stopped.
  • Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

Cool, you have succeeded !

I am also learning on my own data, and I would like to know how many images annotated do you have for a satisfactory result (@ogail )?

from icnet-tensorflow.

erichhhhho avatar erichhhhho commented on July 28, 2024

Sorry. I was wondering if you guys was using pretrianed icnet_cityscapes_bnnomerge.prototxt instead of icnet_cityscapes_trainval_90k_bnnomerge.npy @ogail @hellochick

So, how could I update the pretrained model by changing conv6_cls num_output from 19 to 1 @BCJuan

from icnet-tensorflow.

hellochick avatar hellochick commented on July 28, 2024

Hey @erichhhhho,

You need to change the restore variables just like restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name], thus you can restore the pre-trained weights except the last layer.

from icnet-tensorflow.

amwfarid avatar amwfarid commented on July 28, 2024

Hey @erichhhhho,

You need to change the restore variables just like restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name], thus you can restore the pre-trained weights except the last layer.

I still get the same problem even though I set restore_var without conv6_cls (For model retraining using .npy). Am I missing something?

from icnet-tensorflow.

kangyang94 avatar kangyang94 commented on July 28, 2024

@ogail

Now the project don't have inference.py and tool.py, do you still have the version you used?

from icnet-tensorflow.

amwfarid avatar amwfarid commented on July 28, 2024

@kangyang94

At least for inference.py, it actually exists as a python notebook (demo.ipynb).

from icnet-tensorflow.

prz30 avatar prz30 commented on July 28, 2024

@hellochick I finally got it working, here are steps I did:

  • commenting net.load line
  • Setting number of classes to 2
  • Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
    Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
    img_00197
    Here is original image
    img_00197

What I did for training is following:

  • Run python train.py for 8 hrs until loss reached 0.281, then stopped.
  • Run xpython train.py --update-mean-var --train-beta-gamma (still running) and loss is dropping to 0.27 and continuing.

when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py and train.py --update-mean-var --train-beta-gamma

Hi @ogail
Please forgive me for disturbing you. Can you give me a copy of your code at that moment? Since the author iterative version, I found that there may be differences in some code-modified changes. Thank you! My email address is [email protected]
Finally, please forgive me for my worse English

from icnet-tensorflow.

seushengchao avatar seushengchao commented on July 28, 2024

@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.

Hello, have you solved the problem? It also troubles for a long time!

from icnet-tensorflow.

Mythos-Rudy avatar Mythos-Rudy commented on July 28, 2024

The main reason for this problem is the function create_loss() in train.py , the author ignore the background when compute loss,so that if your NUM_CLASSES is 2,and ignore one of them,loss will be very low if model always predict the pixel to another one. So your loss is reach to 0.5(because l2loss is 0.5), but the model learned nothing. To solve the problem, you have to change the code about ignore one classes

from icnet-tensorflow.

SpencerTrihus avatar SpencerTrihus commented on July 28, 2024

For me this worked:

1. set to in network.py set ignore_missing to True:
def load(self, data_path, session, ignore_missing=True):
1. Edit `INFER_SIZE`, `TRAINING_SIZE` and the whole dict of `others_param`

2. In train.py change
 restore_var = tf.global_variables()

to

restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
1. run
python train.py --dataset others

I am trying to train ICNet on a custom dataset with 2 classes, the background and the object, but I received an error due to cityscapes having 19 classes while mine only includes 2. I have followed the instructions above, which seem to have solved the class problem but now during training all parameters are nan.

I do not understand what is recommended in #20 and in this thread regarding making changes to .prototxt and .npy files. If this is necessary, could you explain how to make this change? If not, what is causing the nan loss results?

Thanks!

from icnet-tensorflow.

gitunit avatar gitunit commented on July 28, 2024

how do i change the number of classes for the pre-trained model? i've found the .prototxt file from the original work but i don't know what to do with it, where to load it.

from icnet-tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.