aloyschen / tensorflow-yolo3 Goto Github PK
View Code? Open in Web Editor NEWtensorflow implementation of yolov3
tensorflow implementation of yolov3
dataset = dataset.repeat().shuffle(70000).batch(batch_size).prefetch(batch_size)
i test the shuffle function and i believe the buffer_size decide the max index of the original data can be sampled, and my data is huge, so when i use the model to train, it stucked at the 40k+, like this:
2018-11-21 21:07:14.170579: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46287 of 70000
2018-11-21 21:07:24.262936: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46432 of 70000
no more logs
any suggestions would be appreciate!
在yolo_loss里面计算raw_true_wh时,一般将无效的grid区域设为0。这里为什么设为1呢:
raw_true_wh = tf.log(tf.where(tf.equal(y_true[index][..., 2:4] / anchors[anchor_mask[index]] * input_shape[::-1], 0), tf.ones_like(y_true[index][..., 2:4]), y_true[index][..., 2:4] / anchors[anchor_mask[index]] * input_shape[::-1]))
其他项目里面:
raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
这里,无效的wh设为了0.
有谁可以解答一下吗?thanks....
hi, running your implementation occurs this error,as follow:
InternalError (see above for traceback): Blas SGEMM launch failed : m=173056, n=32, k=64
[[node darknet53/conv2d_3/Conv2D (defined at /ZFS4T/hitzht/tensorflow-yolo3-master-voc/model/yolo3_model.py:109) = Conv2D[T=DT_FLOAT, _class=["loc:@darknet53/batch_normalization_3/cond/FusedBatchNorm_1/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](darknet53/LeakyRelu_1, darknet53/conv2d_3/kernel/read)]]
[[{{node while/strided_slice_1/stack/_615}} = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3963_while/strided_slice_1/stack", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
i try to run the train() with CPU,the error is disappear.hope you give me some advice.thanks
How long did it take to complete for training?
Please let me know training details.
new_high = new_high * tf.minimum(input_width / new_width, input_high / new_high)
new_width = new_high * tf.minimum(input_width / new_width, input_high / new_high)
line2:I think it is new_width*tf.minimum()
How about the loss? Thanks!
Hi,
I have a problem. I used a single GTX 1080TI to run. Then I followed the steps you said, and tested the image named dog,jpg. But, I found the time of predicting was a few seconds, while darknet was just about 22 ms.
Is there anything wrong with my operation? And what is the performance you think?
Thanks very much!
when i use this code to train model on coco2014 dataset, the memory used keeps increasing until been killed.
In yolo3_model.py
in the function yolo_loss
at the end you are dividing by yolo_output
shape:
class_loss = tf.reduce_sum(class_loss) / tf.cast(tf.shape(yolo_output[0])[0], tf.float32)
Shouldn't it be yolo_output[index]
instead of yolo_output[0]
? You are using it for all the losses.
在训练时,验证集的损失能下降到10左右,但是实际测试的时候,计算出来的物体得分数很低,普遍低于0.3.而且画的框位置对的也不是很准。
在yolo_head函数中:
box_wh = tf.exp(predictions[..., 2:4]) * anchors_tensor / input_shape[::-1]
在yolo_loss函数中:
raw_true_wh = tf.log(tf.where(tf.equal(y_true[index][..., 2:4] / anchors[anchor_mask[index]] * input_shape[::-1], 0), tf.ones_like(y_true[index][..., 2:4]), y_true[index][..., 2:4] / anchors[anchor_mask[index]] * input_shape[::-1]))
...
wh_loss = object_mask * box_loss_scale * 0.5 * tf.square(raw_true_wh - predictions[..., 2:4])
此处取log,我理解的意思是去除绝对边框wh对预测box正确性的影响。不是边框回归公式中的取对数。
希望可以解答一下。
I tried to train this model without darknet53.weights. when I finish 5 Epoch with model saved, I found that I can't load correct weights from my old trained model weights . The loss start from 0, and rise always , can you tell me why.
我没用darknet53做预训练,直接跑的coco数据集,保存了一个阶段成果之后再加载我之前训练的参数,发现损失值loss从0开始 一直往上增加 ,请问这个现象是为什么呢?
I want to generate a .pb file from the ckpt generate during the training.
My issue : I can't find the input node and the output node(s) of this implementation of YOLOv3
I exported all the nodes in text file but here are more than 10 000, I still more confused.
I have modified your code to run and predict for 5 anchors. It works fine when training but the detect part is failing with
AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not create a dilated convolution forward descriptor, in file tensorflow/core/kernels/mkl_conv_ops.cc:1111
[[node darknet53/conv2d_2/Conv2D (defined at D:\yolo_2\tensorflow-yolo3-master\tensorflow-yolo3-master\model\yolo3_model.py:110) ]]
Any help will be appreciated!!!!
Thanks
你好,我运行detect.py时,发生:AttributeError: module 'tensorflow' has no attribute 'glorot_uniform_initializer',但是这个函数确实在tf中文件存在,请问怎么解决?
I'm running your code on a webcam and it is very slow (On CPU). I'm wondering if there are parts of the detection code that I can optimize to run the code faster on CPU.
From line 99 to 113 in dataReader.py, the process of IOU is only related to wh of boxes. As I known, when we calculate the value of IOU between two boxes, all of x, y, w, h should be used. Is there anything wrong with me?
@aloyschen I am trying to implement YOLOv3 using some ideas and modules from your code.
I can't get the loss of my model to converge. My training loss is hovering around 10 even after 100 epochs on a dataset with 200 images of raccoon.
I have disected the model to contain only 2 scales and I am using the pre trained darknet-53 weights with no optimization running over the feature extractor.
I was wondering on which dataset you tried the training of the model and what was the number of epochs, what training loss was like, and other related information for which your model converged and started giving some reasonable predictions.
All the training details are provided in the following cfg
num_parallel_calls = 4
input_shape = 416
max_boxes = 20
jitter = 0.3
hue = 0.1
sat = 1.0
cont = 0.8
bri = 0.1
norm_decay = 0.99
weight_decay = 5e-4
norm_epsilon = 1e-4
pre_train = True
train_last_layers_only = False
num_anchors = 6
num_classes = 1
training = True
disect = True
disect_scale = 1
ignore_thresh = .5
learning_rate = 1e-4
train_batch_size = 10
val_batch_size = 4
# train_num = 4761
# val_num = 250
train_num = 190
val_num = 10
Epoch = 200
obj_threshold = 0.3
nms_threshold = 0.5
gpu_index = "0"
log_dir = './logs'
data_dir = './dataset/'
model_dir = './converted/'
yolov3_cfg_path = './darknet_data/yolov3.cfg'
yolov3_weights_path = './darknet_data/yolov3.weights'
darknet53_weights_path = './darknet_data/darknet53.weights'
anchors_path = './yolo_anchors.txt'
classes_path = './model_data/raccoon_classes.txt'
train_annotations_file = './train.txt'
val_annotations_file = './val.txt'
output_dir = './tfrecords/'
Tensorboard screenshots are attached below
bbox = tf.cond(tf.greater(tf.shape(bbox)[0], 20), lambda: bbox[:20], lambda: tf.pad(bbox, paddings = [[0, 20 - tf.shape(bbox)[0]], [0, 0]], mode = 'CONSTANT'))
出自data_reader中的Preprocess
我使用你的代码设置了pre_train为true(使用了darknet53 pretrain的权重)训练模型,最后的loss值一直在32上下波动。使用eval函数做测试得到的mAP为23.43。如果配置使用yolov3.wights测试mAP,得到的值是48.14。两者相差很大,请问训练时有什么需要注意的吗?有什么办法能提高mAP值吗?
I want to know what rule to use to generate the .tfrecords file.
Thank you for much for your reply.
The w and h not change after padding when processing images , so xmax and ymax not need to add the padding offset dx /dy, I dot know if there is anything wrong
xmin = xmin * new_width / image_width + dx
xmax = xmax * new_width / image_width + dx
ymin = ymin * new_high / image_high + dy
ymax = ymax * new_high / image_high + dy
Hello, thank you very much for your code. Could you tell me how to use this function to get three different real borders? Looking forward to your help.
should it set to be 0.5?or 0.3 can boost the performance?
Hi, I recieved the error when running 'yolo_train.py'. Do you have a similar situation?Thanks very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.