Giter VIP home page Giter VIP logo

east's Issues

classfication loss function

Thanks for sharing this excellent repo. I noticed that the classification loss function you used in code is different from the paper. you use dice coefficient instead of cross entropy. Could you provide more detail on this part?

May be some error in locality_aware_nms.py

In the line 68 of locality_aware_nms.py, you have writen:
return standard_nms(np.array(polys), thres)
However, in the paper, the author wrote that:
return STANDARDNMS(S)
Does it has better performance or is just an error?

Text scale

I have alot of different text regions with different dimensions and scales , how can i set the parameter "TEXT SCALE" correctly ?
how can i put the right number ?

and does it depened on the parameter "INPUT SIZE"

Training Issue

Hi!!!

I have been trying to initiate the training for Arabic Datasets for this model but as soon as try to start it, it gives me this
"poly in wrong direction "

The dataset consists of 9 values compromising (x0,y0) to (x3, y3) clockwise and one word to describe the selected region.

I am using Tensorflow v1.2 using Python 3.5 and I have successfully initiated the demo on my server.

I request you to please guide me on this issue.

Thanks
Burhan Ul Tayyab

weird training problems !

I'm trying to run multigpu_train.py on ubuntu 14.04 , python 2.7.6 and tensorflow 1.1.0 but I got an incomprehensible error :
KeyError: 'pool4'
screenshot from 2017-09-29 11 57 57

[RFC] Roadmap

To serve better as a baseline for further research and those who just want a fast text detector, we are planning to polish this repo from "just works" to "works great". Here are our current plans:

  • C++ NMS matches the result of Python (slow) version
  • Python {2,3} compatible, for both training and evaluation
  • Web demo (offline)
  • Deploy web demo in cloud (east.zxytim.com)
  • Result viewer for web demo
  • Chinese text detection
  • Integrate state-of-the-art small base models to run even faster on CPU/embedded device.
  • Quadrangle prediction
  • Script for easy evaluation
  • Video demo

As we both have our full time jobs, this roadmap will not be subject to a timetable. If you want take one of the tasks above, please start a dedicated issue for that task and kindly submit a pull request.

Also, any suggestions are warmly welcomed.

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Hi, @argman

I have a next problem (training )

....
2017-08-31 18:22:46.750620: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-31 18:22:46.750670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-31 18:22:46.750683: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

Do you know this problem? (I used 2 gpu)

How to reproduce the performance of f1-score=80.83 on ICDAR2015?

Hi,

I can't reproduce the 80.83 f1score when directly run python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \ --text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \ --pretrained_model_path=/tmp/resnet_v1_50.ckpt on the ICDAR2015+2013 training images.

Could you please tell me the parameter configurations of your experiment that achieves 80.83 f1-score?
about: 1. the batch size per GPU; 2. the number of GPUs ;3. the initial learning rate 4. the number of steps that you train your model for.

Thank you very much.

about pvanet training

hi, have you tried pvanet as basenetwork? I tried pvanet using caffe but encountered overfitting problem.
my training sets is 950 images from icdar 2015 trainningsets( the other 50 images as validation sets) and 229 images from icdar 2013.
model is trained by online data augmentation which includes scaling and rotations between ±30°. iou loss overfits a lot that when trainning iou descend to 0.25 validation iou loss still stays high at 0.7. I think I have confirmed everything so much that I can not solve this problem. please help me, Mr. Argman!!!!!!. I have cost two month on this problem.... 555555

Error occurs when trying eval.py

When trying to use the eval.py, error occurs, and it looks like the adaptor.so may have something wrong (e.g. complied by not suitable g++). I'm using g++ 5.4.0

The error report is like this:

Find 1 images
40795 text boxes before nms
Traceback (most recent call last):
  File "eval.py", line 194, in <module>
    tf.app.run()
  File "/home/aqua/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "eval.py", line 160, in main
    boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer)
  File "eval.py", line 98, in detect
    boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
  File "/home/aqua/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9
    from .adaptor import merge_quadrangle_n9 as nms_impl
ImportError: /home/aqua/EAST/lanms/adaptor.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm

Can you help me fix it?

Tried to write class-balanced cross entropy, but couldn't understand it.

I just want to modify loss function, from dice coefficient to class balanced xentropy, but I still don't get what to change.

def batch_flatten(x):
"""
Flatten the tensor except the first dimension.
"""
shape = x.get_shape().as_list()[1:]
if None not in shape:
return tf.reshape(x, [-1, int(np.prod(shape))])
return tf.reshape(x, tf.stack([tf.shape(x)[0], -1]))

def xentropy(y_true_cls, y_pred_cls,
training_mask):
eps = 1e-7

z = batch_flatten(y_pred_cls)
y = tf.cast(batch_flatten(y_true_cls), tf.float32)

count_neg = tf.reduce_sum((1. - y) * training_mask)
count_pos = tf.reduce_sum(y * training_mask)
beta = count_neg / (count_neg + count_pos)
loss_pos = -beta * tf.reduce_mean(y * tf.log(z + eps))

loss_neg = (1. - beta) * tf.reduce_mean((1. - y) * tf.log(1. - z + eps))
cost = tf.subtract(loss_pos, loss_neg, name=name)

return cost

Would this code be work?

Questions about restore_rectangle_rbox

The function restore_rectangle_rbox in icdar.py is so complicate that after spent a lot time to read and study it, but I still can't understand it! Could you provide more information or comments about this function?

More training details

Hi,

I am trying to exactly reproduce your released model. Could you provide some more details about the training. In the readme it looks like you use 14 images per gpu and I see you've mentioned training with 4 gpus? Was your total batch size then 56? Did you adjust the learning rate at all for such large batch size or was the default one used?

Also, you mention using icdar2013 training set as well. Anything special here or is sampling between icdar2015 and 2013 1:1.

Any more details that you think may be relevant?

Btw. Small typo in the readme "Thanks for the author's (@zxytim) help! Please site his paper if you find this useful." site -> cite

Thanks for releasing the code. It's great!

problem of choosing "max_side_len" in eval.py

@argman hi, in eval.py, if using the default max_side_line=2400, the inference result is strange, the large text will not be detected, but even the very small text can be detected. however, when the max_side_line is set to 512 the same as INPUT_SIZE, the very large text can be correctly detected , but the small text will be ignored. thanks!

about balanced cross-entropy loss

The code use the dice_coefficient loss but not balanced cross-entropy loss in the paper, so I follow the paper try the balanced cross-entropy loss, but the performance is very poor with balanced cross-entropy loss which can't achieve the result in paper. I can't figure out this problem why dice_coefficient loss is greater than balanced cross-entropy loss.

Model Testing Issue

Hello !!!

I've successfully trained the model on Arabic dataset, however when I try to test the model, it just returns only the same image as before without any text boxes, can you please help me in that? I've checked the paths again and again and they are correct.

Thanks
Burhan Ul Tayyab

Score Map Generation

In the Section of 3.3.1, the reference length ri = min(D(pi, p(i mod 4)+1),D(pi, p((i+3) mod 4)+1)). When i = 1, r1 = min(D(p1, p2), D(p1, p1)). So r1 = 0, does it? Can you explain in more detail or which part of code is compute this. Thanks!

Want to change dice coefficient function to class-balanced cross entropy function.

I tried to change this code to class-balanced cross entropy function.

def dice_coefficient(y_true_cls, y_pred_cls,
training_mask):
'''
dice loss
:param y_true_cls:
:param y_pred_cls:
:param training_mask:
:return:
'''
eps = 1e-5
intersection = tf.reduce_sum(y_true_cls * y_pred_cls * training_mask)
union = tf.reduce_sum(y_true_cls * training_mask) + tf.reduce_sum(y_pred_cls * training_mask) + eps
loss = 1. - (2 * intersection / union)
tf.summary.scalar('classification_dice_loss', loss)
return loss

However, I don't understand why there is training mask and what its' role is. I would be thankful if somebody tells :) Thanks

question about fit_line()

@zxytim @argman oneline of code I think may be wrong,
in function fit_line() of icdar.py:
def fit_line(p1, p2):
# fit a line ax+by+c = 0
if p1[0] == p1[1]: # Here I think should change to if p1[0] == p2[0]:
return [1., 0., -p1[0]]
else:
[k, b] = np.polyfit(p1, p2, deg=1)
return [k, -1., b]

Training problem

I have a dataset that has 12982 images
when i started training it , i used 24 reader but all what i see is

) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:23:00.0)
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/

and the gpu utilization is 95% and all the 24 core are 100%

it has been 1 hour till now and nothing changed ?

so is there any thing wrong happened ?

can't compile lnms on windows

楼主,您好,我在windows下运行python run_demo_server.py,出现error:
image
image
说是import lnms这儿出错,看了下lanms文件夹下会执行__init__.py的函数
if subprocess.call(['make', '-C', BASE_DIR]) != 0: # return value
raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))
在这个地方出错,楼主知道怎么解决吗?感谢!

the pre-train model can't achieve 80% F1-score.

I run the east_icdar2015_resnet_v1_50_rbox model/model.ckpt-49491 on ICDAR2015-TRW public_test_data, then test the result with the detection_eval_tool but only get 0.4697 F1-score. I don't know what wrong in it.

About the python version

It seems that we should use tensorflow with python3.x to support lanms, which is a process in eval.py. Is it possible for us to use tensorflow with python2.7 to run eval.py?

Thanks.

Am I doing right?

I changed loss function and tried to train data through EAST. However, when I tried it and look how training was going on, I found something weird.

image

image

Above pictures are input data and corresponding score map. Shouldn't gt area be black and elsewhere white, rather than the picture? (In picture, gt area is white and elsewhere black)

Does `from icdar import restore_rectangle` need so much time?

I run the python3 eval.py, or python3 run_demo_server.py, it will run from icdar import restore_rectangle. However, the terminal show that 1000 training images in ./data/train/ and then it may load these training image and it has taken so much time. Does anybody has the same situation?

GPU usage is zero?

@argman @zxytim Hi, I find a new problem, the Volatile GPU-Util is 0 but GPU memory-Usage is about 23 G, and I print the running log, watched that the model load the dataset all the time. Why the model not do actual computation on GPU?

nvidia-smi Info:
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
5 Tesla M60 24GB On | 0000:85:00.0 Off | Off |
| N/A 35C P0 56W / 250W | 23377MiB / 24472MiB | 0% Default

problem in the output

I have trained a model with this command:

python multigpu_train.py --gpu_list=0,1,2 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/backup/EAST/
--text_scale=1024 --training_data_path=/DATA/EAST/data/ --geometry=RBOX --learning_rate=0.0001 --num_readers=12

and i've waited till:

Step 007130, model loss 0.0316, total loss 0.0827, 7.33 seconds/step, 5.73 examples/second

first Question should i make him , do more iterations or this is enough ???

second Question:
The output of all the images seems to be 1 size , why this is happening ?
i couldn't see many variations in the output dimensions

examples:
screenshot from 2017-09-11 09-42-23
screenshot from 2017-09-11 09-42-37
screenshot from 2017-09-11 09-42-58
screenshot from 2017-09-11 09-43-21

so what's missing to be able to detect blocks of text ?

cannot detect single char

aa_1

I run the eval.py to detect text. But all the single digit number in this image cannot be detected. Could someone tell me why? thanks

Cannot compile lanms

I have .jpg images in the folder, I trying to run eval.py, I have trained the model and have checkpoint file.
The command I am using is: python3 eval.py --test_data_path=/home/kamranjanjua/EAST/icdarData/ --gpu_list=0 --checkpoint_path=/modelse/ --output_path=/home/kamranjanjua/EAST/output_icdar/

icdarData folder contains the images.

However, when I run it, the error I get is: raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))

Any solution?

slim.get_trainable_variables(),

Hell argman!
My tensorflow version is 1.01. But I encounter the problem as :

File "multigpu_train.py", line 135, in main variable_restore_op = slim.assign_from_checkpoint_fn(FLAGS.pretrained_model_path, slim.get_trainable_variables(), AttributeError: 'module' object has no attribute 'get_trainable_variables'
And I check tensorflow slim API by Ipython, function"get_trainable_variables" is not available in my version.

So, maybe you should consider to upgrade the required TF version.

Results on Arabic and Urdu Datasets

Hello Argman!!!!

Hope that you are in your finest health, here are some of the results, trained on Arabic Dataset

23

24

Basically, what our main purpose was to detect Urdu News-Tickers, therefore I'm sending you those too.

output 7

output 5

output 4

output 3

output 2

output 1

You can use the photos anyway you want, just cite my github link.

Anyway I can also give you the model if you want!!!!

Thanks Again
Burhan Ul Tayyab

Problem during training

Hi, @argman
I get the following error during training:

,,,
Step 000830, model loss 0.0111, total loss 0.0264, 71.25 seconds/step, 0.39 examples/second
Step 000840, model loss 0.0121, total loss 0.0272, 71.00 seconds/step, 0.39 examples/second
Step 000850, model loss 0.0124, total loss 0.0274, 71.36 seconds/step, 0.39 examples/second
Step 000860, model loss 0.0130, total loss 0.0279, 71.22 seconds/step, 0.39 examples/second
Step 000870, model loss 0.0107, total loss 0.0255, 71.07 seconds/step, 0.39 examples/second
Step 000880, model loss 0.0109, total loss 0.0256, 70.99 seconds/step, 0.39 examples/second
StepTraceback (most recent call last):
  File ".../EAST/icdar.py", line 657, in generator
    score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
  File ".../EAST/icdar.py", line 520, in generate_rbox
    if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
  File ".../EAST/icdar.py", line 248, in point_dist_to_line
    return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

but, training proceeds without stopping.
Do you know anything about this problem? And is it a serious problem in model creation?

Training errors 'Cross point does not exist'

When I train the model with dataset ICDAR2015, I meet the error:

Cross point does not exist
Traceback (most recent call last):
  File "/home/lairf/EAST/icdar.py", line 657, in generator
    score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
  File "/home/lairf/EAST/icdar.py", line 520, in generate_rbox
    if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
  File "/home/lairf/EAST/icdar.py", line 248, in point_dist_to_line
    return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

Why it happens? Does it influence the training result?

Question on EAST/icdar.py

def load_annoataion(p):
    text_polys = []
    text_tags = []
    if not os.path.exists(p):
        return np.array(text_polys, dtype=np.float32)
    with open(p, 'r') as f:
        reader = csv.reader(f)
        for line in reader:
            label = line[-1]
            # strip BOM. \ufeff for python3,  \xef\xbb\bf for python2
            line = [i.strip('\ufeff').strip('\xef\xbb\xbf') for i in line]
            x1, y1, x2, y2, x3, y3, x4, y4 = list(map(float, line[:8]))
            text_polys.append([[x1, y1], [x2, y2], [x3, y3], [x4, y4]])
            if label == '*' or label == '###':
                text_tags.append(True)
            else:
                text_tags.append(False)
return np.array(text_polys, dtype=np.float32), np.array(text_tags, dtype=np.bool)

In here, why 'if label is '*' or ###' then text_tag is true, not false? Shouldn't it be vice versa? If so, what if label has text information?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.