Comments (46)
@hellochick I finally got it working, here are steps I did:
- commenting
net.load
line - Setting number of classes to 2
- Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
Here is original image
What I did for training is following:
- Run
python train.py
for 8 hrs until loss reached 0.281, then stopped. - Run
xpython train.py --update-mean-var --train-beta-gamma
(still running) and loss is dropping to 0.27 and continuing.
when you trained on other datasets, how (meaning how long and what's purpose) do you use train.py
and train.py --update-mean-var --train-beta-gamma
from icnet-tensorflow.
Hi,
In response to @qmy612 (also @ogail ): you can indeed use the pretrained model.
I achieved it yesterday doing the following:
- As in #20 do: Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to your number of classes. (this is from @ogail initial question)
- Then go to network.py, to the load function of class Network and add the following line:
if 'conv6_cls' not in var.name:
before the linesession.run(var.assign(data))
. Also changeignore_missing
toTrue
The function should look something like:
def load(self, data_path, session, ignore_missing=True):
data_dict = np.load(data_path, encoding='latin1').item()
for op_name in data_dict:
with tf.variable_scope(op_name, reuse=True):
for param_name, data in data_dict[op_name].items():
try:
if 'bn' in op_name:
param_name = BN_param_map[param_name]
var = tf.get_variable(param_name)
if 'conv6_cls' not in var.name:
session.run(var.assign(data))
except ValueError:
if not ignore_missing:
raise
Then, you can make the change stated in #20, I mean changing:
restore_var = tf.global_variables()
by
restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
or not.
Indeed it would have the same effect since you have not loaded the conv6_cls from the pretrained model, which is the last layer (classification) of the net.
Hope this helps.
from icnet-tensorflow.
For me this worked:
- set to in network.py set ignore_missing to True:
def load(self, data_path, session, ignore_missing=True):
-
Edit
INFER_SIZE
,TRAINING_SIZE
and the whole dict ofothers_param
-
In train.py change
restore_var = tf.global_variables()
to
restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
- run
python train.py --dataset others
from icnet-tensorflow.
- I think you mean line 189 which is:
net.load(args.restore_from, sess)
I tried and it results in loss being ‘nan’
- i also tried to load from a saved checkpoint (instead of numpy) however issue was that loss is fixed at 0.511 and these
sub4 =0.000
sub24 =0.000
sub124 =0.000
do not change at all.
Any ideas?
from icnet-tensorflow.
@BCJuan excuse me for late reply. Yes I did both changes as well.
from icnet-tensorflow.
@hellochick @ogail hello, my question is that, if my datasets is 2 class, I can only use this network by training from scratch? Can't use the previous layers of pre-trained mode or just train the last cls layer? Because in my experiment with caffe, I can train based on pre-trained models. I am not familiar with tf, but the deeplab_v3+_ tensorflow can also support only training the last layer.
from icnet-tensorflow.
Hey @ogail,
Since by default is to load pre-trained model and keep finetuning on it. However, the pre-trained cityscapes has 19 classes, while your dataset has only 1. You can comment line 191
to solve the problem, training from scratch.
from icnet-tensorflow.
Before that, I want to know what your dataset look like, can you show some examples? If there is only one class, it doesn't need to train anymore, am I right?
from icnet-tensorflow.
The dataset has 2 class obstacles (0) and non-obstacles (255) in a binary format. Here is an example of raw image
This is an example of label image (similar to * labelTrainIds * images in cityscapes)
Think of this as semantic segmentation with two labels (background and foreground). Hope it makes sense. FYI i set the IGNORE_LABEL to 0
from icnet-tensorflow.
It make sense to me. For this case, I think it's difficult to learn to detect obstacles, since the obstacles contain several different objects. Hence, I think you can restore a pre-trained ImageNet, or ADE20k segmentation, and set the learning rate much lower to try on this task.
Btw, I have tried to do the obstacle detection before, and you can refer to Indoor Segmentation. In this project, I detect obstacles by training on ADE20k, and I compressed num_classes from 150 to 27, just for your reference.
from icnet-tensorflow.
Im trying to do something similar with LFW data set http://vis-www.cs.umass.edu/lfw/part_labels/
i've set num_classes to 3 and rearranged masks so that mask is a gray scale image where 0 is hair, 1 is face and 2 is background. I also removed the net.load line on the code the error im getting is when the line loss = tf.nn.sparse_softmax_cross_entropy_with_logits is being called.
ValueError: Rank mismatch: Rank of labels (received 1) should equal rank of logits minus 1 (received 1).
Can u please explain what the function create_loss expects as input? what is the shape of output, label? when i try it i get label of shape (16,250,250,1) and output of shape (16,15,15,3)
after reshaping raw_pred is of shape=(10800,) but label is of shape=(3600,) there is a mismatch here and i suspect its why the function fails, but I cant seem to understand what to do.
from icnet-tensorflow.
@ogail Thank you for the information you provided.
Any change, you could make your script public? It would help us a lot.
Thank you in advance
from icnet-tensorflow.
@bhadresh74 is there a specific question you have?
from icnet-tensorflow.
@ogail Yes.
Couple of them actually.
-
I trained on two classes but my loss seems to be stuck at 0.6 and not going down.
Here are my HP:
batch size: 64
Steps: 60000
Others are as given in the repo. -
While inference, how can I extract probability for each class. The given code returns 0 probability for each pixel for some reason. I would like to know how did you extract the softmax logits?
Thank you
from icnet-tensorflow.
@bhadresh74 Here are some suggestions:
1- getting loss to 0.6
is good indication, pushing it more will require some tinkering like:
- increasing number of training steps
- increasing batch size
- checking to see if ground truth labels has some errors that are consistently failing.
2- I have not tried to extract the probability before.
from icnet-tensorflow.
Hi,
I would like to make a question for you, @ogail , since I had the same problems:
I see that you have done the following:
- commenting net.load line
- Setting number of classes to 2
- Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
But have you also made the changes that you stated at the beginning? Manly:
- Updating icnet_cityscapes_bnnomerge.prototxt by changing conv6_cls num_output from 19 to 1
-
Then replaced this line in train.py
restore_var = tf.global_variables()
with
restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in _v.name]
from icnet-tensorflow.
Yes, you will have to train from scratch
from icnet-tensorflow.
@BCJuan did fine-tuning from pretrained model boosted on your custom task? Have u tried to compare that vs training from scratch?
from icnet-tensorflow.
Yes, it boosted the results. Indeed I was not obtaining any good results without the pretrained model.
I used the icnet_cityscapes_trainval_bnomerge_90k, but I think that any other model can be used.
from icnet-tensorflow.
@BCJuan what's mIoU
before and after using cityecapes pretrained model?
from icnet-tensorflow.
@BCJuan I did load the pretrained model however didn't see much diff between fine-tuning vs training from scratch.
from icnet-tensorflow.
@ogail I do not know since I am just finetunning. But with a one hour run, using the pretrained model I achieve like 20% mIoU while without it 6%. Maybe I am doing something wrong.
from icnet-tensorflow.
@BCJuan Thank you very much, I will try tomorrow.
from icnet-tensorflow.
I'll try finetuning too and will report the results. But from my experience finetuning always gives the boost in tendency to generalization of the model so it's nice to try
from icnet-tensorflow.
Hi @ogail,
Thank you very much for sharing your training steps for us.
Recently, I need to solve the same problem like you, I set my network parameter to the same like yours and the loss can become to 0.17 and keep going down.
However, when I inference my net, the result shows all the image came to 0 or 1, it seems not quite right.
Did you have this problem? Thank you!
from icnet-tensorflow.
@ogail Have you tried to train it for multiple classes? What annotating tool I can use for training it on multiple classes? Also what is the accuracy and fps you are getting on evaluation?
from icnet-tensorflow.
@VincentGu11 is the 0
and/or 1
are how the final rendered image looks like? There's function decode_label
that converts training index to RGB color
@PratibhaT yes I tried. you could use labelme tool. The accuracy and fps depends on the data and the problem so my numbers wont be relevant in general sense.
from icnet-tensorflow.
@ogail I used VIA annotation tool, which gives .json file. But in this code the list.txt refers to .png image for label. Is there a way to convert a .json annotation files to .png to be used as label. What is the output of labelme tool?
from icnet-tensorflow.
@ogail I am training it for my own dataset consisting of 8 classes. I did all the required changes mentioned above but i am still getting the following error :-
Assign requires shapes of both tensors to match. lhs shape= [8] rhs shape= [19]
Is there some particular change that i missed out?
from icnet-tensorflow.
@qmy612 ,Can you share some details about your training with caffe framework?
I have the problem to train with matcaffe downloaded .
from icnet-tensorflow.
@ogail Thank you for the information you provided. Can I ask you two questions?
1.Did you use the ADE20k or any other pre-training model to fine-tune when training your own datasets?
2.What is the basis for setting the IGNORE_LABEL value?
Looking forward to your answer.
from icnet-tensorflow.
@PratibhaT I did not search for such tool however I'd just do conversion myself to get going.
@adisrivasa This seems that (1) number of classes in train.py
is not set to 8
or (2) the protoxtx
file for pretrained checkpoint is not updated to use 8
classes instead of 18
@yeyuanzheng177 (1) I did not use ADE20k for fine-tuning instead I used cityscapes. (2) I set this value to 255
from icnet-tensorflow.
@qmy612 ,hello,,can you give some advice on how to train ICNet on caffe,Thank you for your help
from icnet-tensorflow.
@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.
from icnet-tensorflow.
@VincentGu11 Hello. Do you solve the problem( the result shows all the image came to 0 or 1) ?? Thank you! I have the same problem as you.
from icnet-tensorflow.
I finally got it working, here are steps I did:
- commenting
net.load
line- Setting number of classes to 2
- Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
Here is original image
What I did for training is following:
- Run
python train.py
for 8 hrs until loss reached 0.281, then stopped.- Run
xpython train.py --update-mean-var --train-beta-gamma
(still running) and loss is dropping to 0.27 and continuing.when you trained on other datasets, how (meaning how long and what's purpose) do you use
train.py
andtrain.py --update-mean-var --train-beta-gamma
Cool, you have succeeded !
I am also learning on my own data, and I would like to know how many images annotated do you have for a satisfactory result (@ogail )?
from icnet-tensorflow.
Sorry. I was wondering if you guys was using pretrianed icnet_cityscapes_bnnomerge.prototxt instead of icnet_cityscapes_trainval_90k_bnnomerge.npy @ogail @hellochick
So, how could I update the pretrained model by changing conv6_cls num_output from 19 to 1 @BCJuan
from icnet-tensorflow.
Hey @erichhhhho,
You need to change the restore variables just like restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
, thus you can restore the pre-trained weights except the last layer.
from icnet-tensorflow.
Hey @erichhhhho,
You need to change the restore variables just like
restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]
, thus you can restore the pre-trained weights except the last layer.
I still get the same problem even though I set restore_var
without conv6_cls
(For model retraining using .npy). Am I missing something?
from icnet-tensorflow.
Now the project don't have inference.py and tool.py, do you still have the version you used?
from icnet-tensorflow.
At least for inference.py, it actually exists as a python notebook (demo.ipynb).
from icnet-tensorflow.
@hellochick I finally got it working, here are steps I did:
- commenting
net.load
line- Setting number of classes to 2
- Setting IGNORE_LABEL to arbitrary number not 0 or 255 (i set it to 100)
Then trained network and got good prediction results (I had to update inference.py and tools.py to get this working):
Here is original image
What I did for training is following:
- Run
python train.py
for 8 hrs until loss reached 0.281, then stopped.- Run
xpython train.py --update-mean-var --train-beta-gamma
(still running) and loss is dropping to 0.27 and continuing.when you trained on other datasets, how (meaning how long and what's purpose) do you use
train.py
andtrain.py --update-mean-var --train-beta-gamma
Hi @ogail
Please forgive me for disturbing you. Can you give me a copy of your code at that moment? Since the author iterative version, I found that there may be differences in some code-modified changes. Thank you! My email address is [email protected]
Finally, please forgive me for my worse English
from icnet-tensorflow.
@ogail
Thank you for the information you provided.
My data set is the same as yours. (0,0,0) and (255,255,255) are represented by two categories of tags.
When I set IGNORE_LABEL = 0, the result is
Sub4 =nan
Sub24 =nan
Sub124 =nan
When I set IGNORE_LABEL not to 0, the result is
Step 0 total loss = 3.639, sub4 = 0.471, sub24 = 0.857, sub124 = 1.916 (3.606 sec/step)
Step 1 total loss = 1.897, sub4 = 0.281, sub24 = 0.521, sub124 = 0.451 (0.161 sec/step)
Step 2 total loss = 1.342, sub4 = 0.180, sub24 = 0.267, sub124 = 0.088 (0.162 sec/step)
Step 3 total loss = 1.328, sub4 = 0.181, sub24 = 0.330, sub124 = 0.033 (0.158 sec/step)
Step 4 total loss = 1.173, sub4 = 0.108, sub24 = 0.160, sub124 = 0.007 (0.161 sec/step)
Step 5 total loss = 1.132, sub4 = 0.129, sub24 = 0.074, sub124 = 0.006 (0.159 sec/step)
Step 6 total loss = 1.411, sub4 = 0.056, sub24 = 0.028, sub124 = 0.339 (0.160 sec/step)
Step 7 total loss = 1.055, sub4 = 0.033, sub24 = 0.009, sub124 = 0.001 (0.158 sec/step)
Step 8 total loss = 1.049, sub4 = 0.018, sub24 = 0.004, sub124 = 0.001 (0.160 sec/step)
Step 9 total loss = 1.055, sub4 = 0.025, sub24 = 0.006, sub124 = 0.000 (0.158 sec/step
These results have troubled me.
I just set up the program according to your description, whether it is training from scratch or fine tuning.But it did not work.
Can you tell me where your tips for training the network are?
Looking forward to your answer.
Hello, have you solved the problem? It also troubles for a long time!
from icnet-tensorflow.
The main reason for this problem is the function create_loss() in train.py , the author ignore the background when compute loss,so that if your NUM_CLASSES is 2,and ignore one of them,loss will be very low if model always predict the pixel to another one. So your loss is reach to 0.5(because l2loss is 0.5), but the model learned nothing. To solve the problem, you have to change the code about ignore one classes
from icnet-tensorflow.
For me this worked:
1. set to in network.py set ignore_missing to True:
def load(self, data_path, session, ignore_missing=True):1. Edit `INFER_SIZE`, `TRAINING_SIZE` and the whole dict of `others_param` 2. In train.py change
restore_var = tf.global_variables()to
restore_var = [v for v in tf.global_variables() if 'conv6_cls' not in v.name]1. run
python train.py --dataset others
I am trying to train ICNet on a custom dataset with 2 classes, the background and the object, but I received an error due to cityscapes having 19 classes while mine only includes 2. I have followed the instructions above, which seem to have solved the class problem but now during training all parameters are nan.
I do not understand what is recommended in #20 and in this thread regarding making changes to .prototxt and .npy files. If this is necessary, could you explain how to make this change? If not, what is causing the nan loss results?
Thanks!
from icnet-tensorflow.
how do i change the number of classes for the pre-trained model? i've found the .prototxt file from the original work but i don't know what to do with it, where to load it.
from icnet-tensorflow.
Related Issues (20)
- Errors in restoring the session evalucation.py and network.py
- ValueError: Shape must be rank 4 but is rank 3 for 'data_sub2' (op: 'ResizeBilinear') with input shapes: [720,720,3], [2]. HOT 3
- Why is it suddenly 'killed' run train.py? HOT 1
- How predict the result to use my training model ckpt.meta?
- Dimension not equal HOT 1
- ValueError:Variable conv does not exist
- ValueError when using own dataset HOT 2
- Training own dataset
- Inference time is too high(about 3.5x as supposed to be ~0.04s)
- multi GPU training?
- tensorflow's version HOT 1
- How to use the pre-trained modle of ade20k provided by author,I use the code in demo.ipynb,but it can't open the file,the cityscapes works well.
- Training over-fitting after every epochs
- same classification result with every pixel HOT 1
- can get correct result
- bad results of voc2012
- The update stops and the loss does not drop HOT 4
- Assign requires shapes of both tensors to match. lhs shape= [13] rhs shape= [150]
- 上一个项目
- 关于ade20k的分割结果,颜色和标签有对应关系吗?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icnet-tensorflow.