yeguixin / captcha_solver Goto Github PK

Source code for ACM CCS 2018

Python 100.00%

captcha_solver's Introduction

captcha_solver

Contributors: Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, Zheng Wang Northwest University, China; Lancaster University, UK; Peking University, China

This is an open source code for solving the text-based captchas based on the machine learning technologies. This approach is able to achieve a higher success rate but requires significantly fewer real captchas. Here we exposed partial source code that can run independently for security reasons. Note that it is not production ready. If you encounter any problems, please file an issue on GitHub.

Requirements

Linux or Windows
NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)

License

Source code of this repository is released under the Apache License (v2.0)

Citation

@inproceedings{ye2018yet,     
  title={Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach},  
  author={Ye, Guixin and Tang, Zhanyong and Fang, Dingyi and Zhu, Zhanxing and Feng, Yansong and Xu, Pengfei and Chen, Xiaojiang and Wang, Zheng},  
  booktitle={The 25th ACM Conference on Computer and Communications Security},     
  series = {CCS '18},     
  year={2018},     
  organization={ACM}    
  }

captcha_solver's People

Contributors

Stargazers

Watchers

captcha_solver's Issues

About the parameter settings of Captcha Synthesizer

Hello! bro, I have seen your paper, I have the question about the paramter settings of the Captcha Synthesizer, I don't have seen the parameter settings implement in the code ,,So, I think you have paste the captcha in the white image, and the roate angle, or the color change parameter is trained by the generator network ?is it correct ? or you have implement the parameter settings by other method? thx....

500 image samples for Synthetic generator

Can please share me the 500 sample dataset for synthetic image generator , that would be greatly helpful for my studies .

Lenet model overfitting

Your paper used 200,000 synthetic images to train the LeNet5 model. I used the same number of data sets, but my accuracy and loss are declining, and I think there is an overfitting problem. Have you encountered this problem?

Code to remove security features before training it in PIX2PIX

pretrain-model

Can your open the pretrain-model?

captcha Synthesizerh关于验证码合成器

非常感谢作者，读了论文，感觉作者的提出的想法很赞，通过合成验证码来实现小样本的验证码识别。有一个问题想要请教，论文提到先用generator生成验证码，然后利用GAN网络实现验证码的像素级的微调，让合成验证码和真实的更像。我想知道这个微调的GAN网络结构是怎样的？能具体解释下吗？期待得到您的回复
个人分析此处GAN网络应该是Image-based conditional GAN，但要实现像素级的微调，感觉有点困难，希望得到作者指点

about code

@yeguixin I am very interested in your paper, but which part of the code is generated synthetic captchas without background consusion?
And where is the preprocessing code?
I look forward to your reply.Thanks you very much！

How do you get the labels of synthesized captchas?

Sorry for disturbing. Would you mind explaining how you get the labels of the synthesized captchas?

对于论文几处不太明白的地方

首先感谢作者的这篇论文，给了很多启发。
认真的读了论文，有几处不太明白的地方，希望能解答。

我读论文理解的为

captcha generator生成器生成两种验证码(with security and without security)，这个生成器是Opencv或者Pillow生成的而不是模型
captcha synthesizer是通过输入with security(上一步生成)输出为长得像real-like captcha，判别器模型输出captcha和真实captcha
preprocessing是通过输入captcha synthesizer的生成结果生成 without security图片(第一步生成的),判别器区分模型输出结果和第一步的without security captcha

疑问:但是看Figure11和12，经过preprocessing又不像是生成的without security，所以preprocessing的目标图片怎么得来的

后面的LeNet的输入时without security captcha(经过preprocessing)

我的疑问

We use the grid search method presented in [4] to search for the optimal parameters for a given captcha scheme这个grid search是如何判定，穷举所有选择训练生成器，然后看效果？
Training the synthesizer takes around 2 days for one captcha scheme on our platform是由于模型复杂？

对于Synthesizer有些不太理解的地方

您好，在读您的论文时，感觉在Synthesizer部分有一些前后矛盾的地方。

在fiqure4中，您指出captcha image generator只负责在文字部分添加security features，而背景噪声这些则是交给网络来生成。但是在figure5里，您又指出类似于背景/斜线/噪声需要参数指定。那么背景噪声到底是由网络学习出来的，还是由第一步captcha image generator就已经生成出来了？

file real/real_train.txt

Hey,
While trying to run your code I get the error no such file real_train.txt. Could you please tell what is the expected content of this file?

dataset

Your article is great, but when I run the code I find that there is no dataset. That is my question . Looking forward to your reply.

Release of dataset?

Hello. Is it possible to receive the various captcha datasets used in your paper?

Lenet model overfitting

questions about CAPTCHA solver

Q1:can your solver solve the scene that there are repetitive characters in an image
Q2:can the solver output the characters same as the marshalling sequence in the real image
问题1：验证码求解器可以应对验证码中有的字符出现多次的情况吗
问题2：验证码求解器能够按图片中字符的排列顺序来输出字符吗
Expecting your answers sincerely， thanks!

small set

@yeguixin Why can you train the network well with only 500 real captchas, owing to the network is simple, or other skills.Could you give me some suggestion? Look forward to your replay!

Lenet model overfitting

the question about Generator

In the code about image.ImageDataGenerator(),i find you commented some code about image transformation. whether i should add these code in this project?

Lenet model overfitting

whether the code is wholeness

i am not understand where are the generator_network and
Discriminator network?

Is this code complete?

    I browse the paper and the code but cant find any GAN mechanism in it。the paper say

"Our captcha generator model includes a image generator and a generator network. The image generator produces a captcha image at the word level, and the generator network modifies the produced captcha image at the pixel level to add security features."
But there is only a keras image generator in your code without any generator network. Cant find any trace of GAN

yeguixin / captcha_solver Goto Github PK

captcha_solver's Introduction

captcha_solver

Requirements

License

Citation

captcha_solver's People

Contributors

Stargazers

Watchers

Forkers

captcha_solver's Issues

我读论文理解的为

我的疑问

Recommend Projects

Recommend Topics

Recommend Org