Giter VIP home page Giter VIP logo

Comments (32)

mmortazavi avatar mmortazavi commented on May 18, 2024 4

@cole8888 Oh sorry I had to mention that I was running on Tensorflow '1.14.0' executing create_pascal_tf_record.py, which I did not need to change many things in the codes.

I wonder if that is the reason the generated TFRecords are not working for training with this repo that is based on Tensorflow 2.

I just tried to create the TFRecords by executing create_pascal_tf_record.py again using Tensorflow 2, specifically '2.0.0-rc1', and I confirm that I get to change a few scripts (one like you mentioned):

  • tf.app.run() to tf.compat.v1.app.run()
  • tf.app.flags to tf.compat.v1.app.flags
  • tf.python_io to tf.compat.v1.python_io
  • tf.gfile to tf.io.gfile
    Note that these changes were not only necessary in the create_pascal_tf_record.py, but also in others in utils that you get to see the error to fix then one by one! Finally running:
    $ python data\create_pascal_tf_record.py --data_dir=data\ --year=datasetName --set=test --output_path=data\record\test.tfrecord --label_map_path=data\labelmap.pbtxt

Successfully generated the TFRecords again.

BTW, about that error:

Epoch 1/100 2019-10-02 10:18:53.096213: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_xent_op.cc:90 : Invalid argument: Received a label value of -1 which is outside the valid range of [0, 80). Label values: 0 0 0 0 0 0 0 0 ....

Yes, I think this is different. Ahh I used to get this, but unfortunately do not remember how I fixed it! I think I kept searching until I found a suggestion or simply I was missing some changes in files which was not compatible for my dataset (take a closer look)!!

However, I can not train again! The training begins for the first epoch, and after a few mins, it stops with:

[[node loss/yolo_output_2_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\MajidMortazavi\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_distributed_function_40079]

Function call stack:
distributed_function

It seems this is a know issue for Tensorflow '2.0.0-rc*', found this open issue . So @AnaRhisT94 No!!!!

from yolov3-tf2.

 avatar commented on May 18, 2024 2

@rlewkowicz were you able to train properly after you got your tf records? Right now there are open issues regarding training. Just wondering if you managed to train successfully.

from yolov3-tf2.

mmortazavi avatar mmortazavi commented on May 18, 2024 2

I was facing the same issue, and coming to believe it has to do with TFRecord files (more about Tensorflow tf.records . Here I am sharing the full story and the solution I found that worked for me, hope others can use.

I initially started off using the TFRecord files generated using, I guess it was Dat Tran Object Detection repo that implemented the generate_tfrecord.py script. However, using the this TFRecord files won't work and because it is missing the image/key/sha256 feature!

Others suggested an easy solution using Microsoft VoTT to export the TFRecords. However, I have already images with annotations, and I couldn't reload/resume in Microsoft VoTT annotations not even in the desktop version so that I can easily export the TFRecords (maybe I missed how to do it, I spent one hour and couldn't figure out and I thought this feature is not available), and there was no way I would relabel from scratch!! Anyway, long story short it was recommend in the repo about the create_pascal_tf_record.py but it was not super clear how to use it till I cam across this blog post. The blog gives a fairly enough explanation about changes one needs to do before simply running the script. Either you need to modify more in the script or simply adopt the data structure proposed and the script will work. Just a few remarks on top of the blog post documentation:

  • train.txt/test.txt simply contains list of train/test files (images/xml) without extensions and header!
  • Take extra care about Widows paths. Even it says 'change the data['folder'] in line 85 into your own dataset path', in Windows I replaced the data['folder'] with e.g. r'C:\full\path\to\datasetName', notice the r in the front of the path string.
  • Funnily I have had in my xml generated using of these annotating tools, I have had 'annotations', where in line 181 of the create_pascal_tf_record.py script in was originally dataset_util.recursive_parse_xml_to_dict(xml)['annotation'], make sure this corresponds to your xml naming.
  • When you follow the directory structure proposed in the blog, you need to be careful about importing modules (from object_detection.utils import dataset_util, label_map_util), the error would be obvious and you will fix it. But if you struggle just import sys.path.append("..") before importing the object_detection.utils, and you should be fine.

At the end, with great attention to these small details (it is not that hard!, but they matter), you can run the script (I did within Anaconda Prompt in Windows):

$ python data\create_pascal_tf_record.py --data_dir=data\ --year=datasetName --set=test --output_path=data\record\test.tfrecord --label_map_path=data\labelmap.pbtxt

The TFRecords generated will work just fine with yolov3-tf2 (this repo). Happy detecting.

from yolov3-tf2.

ancoca13 avatar ancoca13 commented on May 18, 2024 2

Hello,

I found the problem. Some of the images that I was using had a lo of bounting boxes. Specifically there was one image with 113 bounding boxes the one that was failing.

I'm pretty sure that the promem is with the following line in dataset.py

paddings = [[0, 100 - tf.shape(y_train)[0]], [0, 0]]

Thats why I get "Paddings must be non-negative: 0 -13".

As I'm not sure if it would be good to increase that '100' in the code I decided to remove that image fom my tfrecord file and now it works.

from yolov3-tf2.

WHBSmith avatar WHBSmith commented on May 18, 2024 1

Does the training work for one image? If you look at the function for loading the dataset you'll see that it can handle multiple tf records, simply use dataset.load_tfrecord_dataset and pass it /your_directory/*.tfrecord as your file pattern.

from yolov3-tf2.

zzh8829 avatar zzh8829 commented on May 18, 2024 1

I have added --yolo_max_boxes flag and introduced a complete tutorial on training please create new issue if you still have trouble with training.

https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md

from yolov3-tf2.

rlewkowicz avatar rlewkowicz commented on May 18, 2024

Where are you getting your tf records from? I recall I had similar issues and what I can say is that the vott pascal export works amazingly.

You'll have to make some small modifications to this script:
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pascal_tf_record.py

But it will generate some fully functional tf records

from yolov3-tf2.

Pari-singh avatar Pari-singh commented on May 18, 2024

@ycelik @rlewkowicz I agree with ycelik, I give the tfrecord file for my custom dataset and getting errors while training

from yolov3-tf2.

Rainweic avatar Rainweic commented on May 18, 2024

DId you solve it? I miss one error like your problem.

2019-07-26 09:28:27.838122: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at iterator_ops.cc:1055 : Invalid argument: Paddings must be non-negative: 0 -16
         [[{{node Pad}}]]
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    app.run(main)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 116, in main
    for batch, (images, labels) in enumerate(train_dataset):
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 586, in __next__
    return self.next()
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 623, in next
    return self._next_internal()
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 615, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2120, in iterator_get_next_sync
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -16
         [[{{node Pad}}]] [Op:IteratorGetNextSync]
Exception ignored in: <bound method _CheckpointRestoreCoordinator.__del__ of <tensorflow.python.training.tracking.util._CheckpointRestoreCoordinator object at 0x7f0c8052abe0>>
Traceback (most recent call last):
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 244, in __del__
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 93, in node_names
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/object_identity.py", line 76, in __getitem__
KeyError: (<tensorflow.python.training.tracking.object_identity._ObjectIdentityWrapper object at 0x7f0c8049a048>,)```

from yolov3-tf2.

shaunm avatar shaunm commented on May 18, 2024

I managed to resolve the issue a while back by going back and making sure filepaths were absolute and that bbox coordinates were integers. I also used a different script to convert my training data to a tfrecord. Not sure which made the difference.

from yolov3-tf2.

shaunm avatar shaunm commented on May 18, 2024

I will leave this thread open for the present time because it appears others are having the same issue.
Thank you @rlewkowicz for taking the time to provide suggestions. Sorry for the late reply to this thread, I got caught up in school work.

from yolov3-tf2.

Rainweic avatar Rainweic commented on May 18, 2024

I managed to resolve the issue a while back by going back and making sure filepaths were absolute and that bbox coordinates were integers. I also used a different script to convert my training data to a tfrecord. Not sure which made the difference.

Can you show me the script? Thanks!

from yolov3-tf2.

yonghuixu avatar yonghuixu commented on May 18, 2024

Hello, I have a similar problem these days, and I could not solve it. Do you have some advice?
Epoch: [93/10] step: [186/2] time: 0.2719242572784424s, mse: 0.03679807484149933
2019-07-28 12:23:43.405599: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Input shape axis 0 must equal 4, got shape [3]
[[{{node crop_to_bounding_box_1/unstack}}]]
Traceback (most recent call last):
File "train.py", line 380, in
train()
File "train.py", line 178, in train
for step, (lr_patchs, hr_patchs) in enumerate(train_ds):
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in next
return self.next()
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next
return self._next_internal()
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [3]
[[{{node crop_to_bounding_box_1/unstack}}]] [Op:IteratorGetNextSync]
2019-07-28 12:23:43.507127: W tensorflow/core/kernels/data/generator_dataset_op.cc:79] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

from yolov3-tf2.

yonghuixu avatar yonghuixu commented on May 18, 2024

Where are you getting your tf records from? I recall I had similar issues and what I can say is that the vott pascal export works amazingly.

You'll have to make some small modifications to this script:
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pascal_tf_record.py

But it will generate some fully functional tf records

Hello, I have a similar problem these days, and I could not solve it. Do you have some advice?
Epoch: [93/10] step: [186/2] time: 0.2719242572784424s, mse: 0.03679807484149933
2019-07-28 12:23:43.405599: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Input shape axis 0 must equal 4, got shape [3]
[[{{node crop_to_bounding_box_1/unstack}}]]
Traceback (most recent call last):
File "train.py", line 380, in
train()
File "train.py", line 178, in train
for step, (lr_patchs, hr_patchs) in enumerate(train_ds):
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in next
return self.next()
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next
return self._next_internal()
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [3]
[[{{node crop_to_bounding_box_1/unstack}}]] [Op:IteratorGetNextSync]
2019-07-28 12:23:43.507127: W tensorflow/core/kernels/data/generator_dataset_op.cc:79] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

from yolov3-tf2.

iamsaksham avatar iamsaksham commented on May 18, 2024

Any updates? This issue still exists.
I got the same errors

from yolov3-tf2.

iamsaksham avatar iamsaksham commented on May 18, 2024

I think I've figured this out.

As suggested by @rlewkowicz, This is an issue with the tfrecord file. The file when generated din't had the image/key/sha256 feature which is required during training.

Please refer to the tensorflow issue tensorflow/models#6253 (comment)

from yolov3-tf2.

shaunm avatar shaunm commented on May 18, 2024

I think I've figured this out.

As suggested by @rlewkowicz, This is an issue with the tfrecord file. The file when generated din't had the image/key/sha256 feature which is required during training.

Please refer to the tensorflow issue tensorflow/models#6253 (comment)

That ended up being the problem for me as well. Since then I have used Microsoft VoTT to markup data and the tool has a tfrecord export option which works.

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

@mmortazavi could you share your version of create_pascal_tf_record.py? I am trying to recreate it based on your instructions and I have made progress but I am encountering some issues I can't solve.

from yolov3-tf2.

mmortazavi avatar mmortazavi commented on May 18, 2024

Sure @cole8888. What what kind of error you get though? I have the following directory tree:

Capture

And here is the create_pascal_tf_record.py, which slighlty modified to generate the TFRecords:

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

r"""Convert raw PASCAL dataset to TFRecord for object_detection.

Example usage:
    python object_detection/dataset_tools/create_pascal_tf_record.py \
        --data_dir=/home/user/VOCdevkit \
        --year=VOC2012 \
        --output_path=/home/user/pascal.record
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import hashlib
import io
import logging
import os
import sys

sys.path.append("..")

from lxml import etree
import PIL.Image
import tensorflow as tf

from object_detection.utils import dataset_util
from object_detection.utils import label_map_util


flags = tf.app.flags
flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set or '
                    'merged set.')
flags.DEFINE_string('annotations_dir', 'Annotations',
                    '(Relative) path to annotations directory.')
flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('label_map_path', 'data/pascal_label_map.pbtxt',
                    'Path to label map proto')
flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '
                     'difficult instances')
FLAGS = flags.FLAGS

SETS = ['train', 'test']
YEARS = ['datasetName ']

def dict_to_tf_example(data,
                       dataset_directory,
                       label_map_dict,
                       ignore_difficult_instances=False,
                       image_subdirectory='JPEGImages'):
  """Convert XML derived dict to tf.Example proto.

  Notice that this function normalizes the bounding box coordinates provided
  by the raw data.

  Args:
    data: dict holding PASCAL XML fields for a single image (obtained by
      running dataset_util.recursive_parse_xml_to_dict)
    dataset_directory: Path to root directory holding PASCAL dataset
    label_map_dict: A map from string label names to integers ids.
    ignore_difficult_instances: Whether to skip difficult instances in the
      dataset  (default: False).
    image_subdirectory: String specifying subdirectory within the
      PASCAL dataset directory holding the actual image data.

  Returns:
    example: The converted tf.Example.

  Raises:
    ValueError: if the image pointed to by data['filename'] is not a valid JPEG
  """
  img_path = os.path.join(r'C:\tensorflow1\models\research\object_detection\data\datasetName', image_subdirectory, data['filename'])
  full_path = os.path.join(dataset_directory, img_path)
  with tf.gfile.GFile(full_path, 'rb') as fid:
    encoded_jpg = fid.read()
  encoded_jpg_io = io.BytesIO(encoded_jpg)
  image = PIL.Image.open(encoded_jpg_io)
  if image.format != 'JPEG':
    raise ValueError('Image format not JPEG')
  key = hashlib.sha256(encoded_jpg).hexdigest()

  width = int(data['size']['width'])
  height = int(data['size']['height'])

  xmin = []
  ymin = []
  xmax = []
  ymax = []
  classes = []
  classes_text = []
  truncated = []
  poses = []
  difficult_obj = []
  if 'object' in data:
    for obj in data['object']:
      difficult = bool(int(obj['difficult']))
      if ignore_difficult_instances and difficult:
        continue

      difficult_obj.append(int(difficult))

      xmin.append(float(obj['bndbox']['xmin']) / width)
      ymin.append(float(obj['bndbox']['ymin']) / height)
      xmax.append(float(obj['bndbox']['xmax']) / width)
      ymax.append(float(obj['bndbox']['ymax']) / height)
      classes_text.append(obj['name'].encode('utf8'))
      classes.append(label_map_dict[obj['name']])
      truncated.append(int(obj['truncated']))
      poses.append(obj['pose'].encode('utf8'))

  example = tf.train.Example(features=tf.train.Features(feature={
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/source_id': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes),
      'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
      'image/object/truncated': dataset_util.int64_list_feature(truncated),
      'image/object/view': dataset_util.bytes_list_feature(poses),
  }))
  return example


def main(_):
  if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))
  if FLAGS.year not in YEARS:
    raise ValueError('year must be in : {}'.format(YEARS))

  data_dir = FLAGS.data_dir
  years = ['datasetName']
  if FLAGS.year != 'merged':
    years = [FLAGS.year]

  writer = tf.python_io.TFRecordWriter(FLAGS.output_path)

  label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)

  for year in years:
    logging.info('Reading from PASCAL %s dataset.', year)
    examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main',
                                  FLAGS.set + '.txt')
    annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir)
    examples_list = dataset_util.read_examples_list(examples_path)
    # print(examples_list)
    for idx, example in enumerate(examples_list):
      # print('Step 0', example)
      if idx % 100 == 0:
        logging.info('On image %d of %d', idx, len(examples_list))
      path = os.path.join(annotations_dir, example + '.xml')
      print('Step 1', path)
      with tf.gfile.GFile(path, 'r') as fid:
        xml_str = fid.read()
      xml = etree.fromstring(xml_str)
      data = dataset_util.recursive_parse_xml_to_dict(xml)['annotations']

      tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,
                                      FLAGS.ignore_difficult_instances)
      writer.write(tf_example.SerializeToString())

  writer.close()


if __name__ == '__main__':
  tf.app.run()

And this is called from Anaconda Promt:

$ python data\create_pascal_tf_record.py --data_dir=data\ --year=datasetName --set=test --output_path=data\record\test.tfrecord --label_map_path=data\labelmap.pbtxt

Please take a good care of the directory structure. For me, the datasetName sits within a the data folder that contains the create_pascal_tf_record.py (see below), unless you know how to handle these things properly!

|-- data
    |--datasetName
    |--record
    |--create_pascal_tf_record.py

This should work! Although as I mentioned here #43 that even with these TFRecords I have problem training! Let me know how you proceed.

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

I tried your code and it gives me an error saying that tensorflow has no attribute "app".
From what I've read online Tensorflow 2.0 doesn't use this. Are you using a different version of Tensorflow?

Regardless I solved it by changing tf.app.run() to tf.compat.v1.app.run().
I got similar errors later which needed solutions such as flags = tf.app.flags to flags = tf.compat.v1.app.flags and tf.python_io to tf.compat.v1.python_io and tf.gfile.GFile to tf.io.gfile.GFile.

Now I get this error:
Epoch 1/100 2019-10-02 10:18:53.096213: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_xent_op.cc:90 : Invalid argument: Received a label value of -1 which is outside the valid range of [0, 80). Label values: 0 0 0 0 0 0 0 0 ....

after a lot of zeros and a few -1s I get:

2019-10-02 10:18:53.096300: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Received a label value of -1 which is outside the valid range of [0, 80). Label values: 0 0 0.....

This seems to be related to the other issue you mentioned.

from yolov3-tf2.

AnaRhisT94 avatar AnaRhisT94 commented on May 18, 2024

@mmortazavi Hi Majid,
Were you able to successfully train?
I'm getting Nan's when training. Any ideas how to solve this?
@cole8888 Hey Cole,
were you able to solve your issue?

from yolov3-tf2.

AnaRhisT94 avatar AnaRhisT94 commented on May 18, 2024

@mmortazavi I see, thank you. I'm still trying to be able to train, on any dataset, doesn't matter, atleast to get it working and continue from there. If you get it working please let me know

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

@AnaRhisT94 I was unable to resolve my issues and have started looking into other repositories which use Tensorflow 1.15. Apparently Tensorflow 2.0 does not work well with any of the object detection programs right now, you are better off using Tensorflow 1.15. If you do manage to find one that works let me know!

from yolov3-tf2.

mmortazavi avatar mmortazavi commented on May 18, 2024

Agreed. Well I have a detector trained based on Google Objecxt Detection API that is working, but inference is slow and I thought I give Yolo a shot. So far I could not make a Yolo based on Tensorflow work. Meanwhile I have found this repo, looked promising, but similarity run into multiple issues and it is not even TF2!

from yolov3-tf2.

WHBSmith avatar WHBSmith commented on May 18, 2024

@cole8888 @AnaRhisT94 It took a while but I have YOLOv3 working quite nicely in Tensorflow 2.0. Tfrecords are irritating as they are hard to verify and inspect. Microsoft VoTT (https://github.com/microsoft/VoTT) is the tool I use to create the tfrecords. Although it is a real pain to get working on Linux I think you can use it from the browser and I've heard it is easier to use on Windows. A nice test is to simply generate one tfrecord file and then to train from scratch on that, then test on the training image. 1000 epochs on one image of an object should get it to learn that object image. Once you've established that it's not so hard to get it working for larger datasets

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

@AntiDoctor I have VoTT installed on my linux enviroment and I am able to label my images but whenever I export them as Tfrecords it creates one for each image. How did you get it to work?

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

@AntiDoctor using this repo (https://github.com/AntonMu/TrainYourOwnYOLO) @mmortazavi linked a few days ago I was able to train and detect with tensorflow 1.15.0 using CSV labels exported by VoTT. You should be able to get that one working also.

from yolov3-tf2.

ancoca13 avatar ancoca13 commented on May 18, 2024

I think I'm facing the same issue.

I get this error when I try to train the model:

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_<lambda>_14279}} Paddings must be non-negative: 0 -13 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_34604]

It seems like if the tf record has any error. I've seen that this error appears when the bounding box points are specified with integers, but I have used normalized points with values between 0 and 1 so I don't really know whats the problems.

This is the code that I use to generate each trecord:

`def create_tf_example(label_and_data_info, directory):

path = directory + "/" + label_and_data_info["name"] + ".jpg"

im = Image.open(path)
width, height = im.size

filename = label_and_data_info["name"].encode('utf8') # Filename of the image. Empty if image is not from file
image_format = b'jpeg' # b'jpeg' or b'png'

with tf.gfile.GFile(path, 'rb') as fid:
encoded_jpg = fid.read()

key = hashlib.sha256(encoded_jpg).hexdigest()

xmins = label_and_data_info["xmins"] #list of floats between 0.0 and 1.0
xmaxs = label_and_data_info["xmaxs"] #list of floats between 0.0 and 1.0
ymins = label_and_data_info["ymins"] #list of floats between 0.0 and 1.0
ymaxs = label_and_data_info["ymaxs"] #list of floats between 0.0 and 1.0
classes_text = [x.encode('utf8') for x in label_and_data_info["labels"]] #list of strings ("Car" or "Person")
classes = label_and_data_info["classes"] #list of integers (0 or 1)

tf_label_and_data = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_label_and_data`

from yolov3-tf2.

ancoca13 avatar ancoca13 commented on May 18, 2024

I think I'm facing the same issue.

I get this error when I try to train the model:

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_<lambda>_14279}} Paddings must be non-negative: 0 -13 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_34604]

It seems like if the tf record has any error. I've seen that this error appears when the bounding box points are specified with integers, but I have used normalized points with values between 0 and 1 so I don't really know whats the problems.

This is the code that I use to generate each trecord:

`def create_tf_example(label_and_data_info, directory):

path = directory + "/" + label_and_data_info["name"] + ".jpg"

im = Image.open(path)
width, height = im.size

filename = label_and_data_info["name"].encode('utf8') # Filename of the image. Empty if image is not from file
image_format = b'jpeg' # b'jpeg' or b'png'

with tf.gfile.GFile(path, 'rb') as fid:
encoded_jpg = fid.read()

key = hashlib.sha256(encoded_jpg).hexdigest()

xmins = label_and_data_info["xmins"] #list of floats between 0.0 and 1.0
xmaxs = label_and_data_info["xmaxs"] #list of floats between 0.0 and 1.0
ymins = label_and_data_info["ymins"] #list of floats between 0.0 and 1.0
ymaxs = label_and_data_info["ymaxs"] #list of floats between 0.0 and 1.0
classes_text = [x.encode('utf8') for x in label_and_data_info["labels"]] #list of strings ("Car" or "Person")
classes = label_and_data_info["classes"] #list of integers (0 or 1)

tf_label_and_data = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_label_and_data`

I've been able to pass this error (now I'm having others) by defininig mun_classes parameter.

from yolov3-tf2.

ancoca13 avatar ancoca13 commented on May 18, 2024

I think I'm facing the same issue.
I get this error when I try to train the model:
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_<lambda>_14279}} Paddings must be non-negative: 0 -13 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_34604]
It seems like if the tf record has any error. I've seen that this error appears when the bounding box points are specified with integers, but I have used normalized points with values between 0 and 1 so I don't really know whats the problems.
This is the code that I use to generate each trecord:
def create_tf_example(label_and_data_info, directory): path = directory + "/" + label_and_data_info["name"] + ".jpg" im = Image.open(path) width, height = im.size filename = label_and_data_info["name"].encode('utf8') # Filename of the image. Empty if image is not from file image_format = b'jpeg' # b'jpeg' or b'png' with tf.gfile.GFile(path, 'rb') as fid: encoded_jpg = fid.read() key = hashlib.sha256(encoded_jpg).hexdigest() xmins = label_and_data_info["xmins"] #list of floats between 0.0 and 1.0 xmaxs = label_and_data_info["xmaxs"] #list of floats between 0.0 and 1.0 ymins = label_and_data_info["ymins"] #list of floats between 0.0 and 1.0 ymaxs = label_and_data_info["ymaxs"] #list of floats between 0.0 and 1.0 classes_text = [x.encode('utf8') for x in label_and_data_info["labels"]] #list of strings ("Car" or "Person") classes = label_and_data_info["classes"] #list of integers (0 or 1) tf_label_and_data = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_label_and_data

I've been able to pass this error (now I'm having others) by defininig mun_classes parameter.

Correction: My previous problems were caused because I was trying to train a 2 classes model using the 80 classes pretrained weights.
Now that I have selected 'none' transfer to train from the scratch after some time training I get the same error again:

Paddings must be non-negative: 0 -13 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_39850]

from yolov3-tf2.

cole8888 avatar cole8888 commented on May 18, 2024

@ancoca13 If I were you I'd switch to a Tensorflow 1.X package, as of writing Tensorflow 2 is not able to run object detection reliably on all platforms.

I've had success with https://github.com/AntonMu/TrainYourOwnYOLO if you'd like to try it out.

from yolov3-tf2.

WHBSmith avatar WHBSmith commented on May 18, 2024

@ancoca13 I literally had this same issue, stop using your own script to generate the tfrecords and use VOTT. Your script is wrong. Problem solved. If you want to reverse engineer a script to create the tfrecord files after you've created them using VOTT you can, but object detection is perfectly reliable on tensorflow 2

from yolov3-tf2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.