zhaipro / easy12306 Goto Github PK
View Code? Open in Web Editor NEW使用机器学习算法完成对12306验证码的自动识别
License: Artistic License 2.0
使用机器学习算法完成对12306验证码的自动识别
License: Artistic License 2.0
您好,我学习过了一些人工智能的基础知识,现在想做从收集数据,数据预处理,到模型的生成的一系列过程,我发现你的这个项目挺适合我。但我看了代码后并没有模型生成那个过程,我对这部分很好奇,能否提供一下思路,或者数据?谢啦
大神,你这个文字部分只能识别第一个词,可以参考下这个验证码文字部分的切割吗?可以切割多个词 https://github.com/libowei1213/12306_captcha/blob/master/image_utils.py
测试发现会报错,具体日志为:
error: (-215:Assertion failed) channels == 1 || channels == 3 || channels == 4 in function 'cv::imencode'
Line 49 in d604e45
line49应修改为: texts,_ = load_data()
请问captcha.npz怎么生成的,代码中没看到相关脚本?求解答~
zhaipro@localhost ~/easy12306> python3 mlearn.py
Using TensorFlow backend.
Train on 10047 samples, validate on 1117 samples
Epoch 1/30
[=] - 14s 1ms/step - loss: 1.9007 - acc: 0.5465 - val_loss: 0.5589 - val_acc: 0.8478
Epoch 2/30
[=] - 14s 1ms/step - loss: 0.2237 - acc: 0.9438 - val_loss: 0.1225 - val_acc: 0.9678
Epoch 18/30
[=] - 14s 1ms/step - loss: 8.1089e-06 - acc: 1.0000 - val_loss: 0.0216 - val_acc: 0.9937
Epoch 30/30
[=] - 14s 1ms/step - loss: 8.0525e-06 - acc: 1.0000 - val_loss: 0.0211 - val_acc: 0.9937
你好,我想学习学习楼主的实现过程,想要完整地跑一遍代码,现在我知道:
1 我需要先运行pretreatement.py
, 得到data.npz数据集;
2 baidu.py通过baidu API识别标签的结果;
3 第三步我应该做什么?我看mlearn.py以及mlearn_for_image.py需要的.npz或者.npy文件,都不清楚如何生成。在google drive上倒是有,但是想知道如何生成的?
还有,下载图片的话,能下载多少,我下载了1800张左右时候,就开始大量有重复的文件了。
I just copied your programme and downloaded your datamodes(12306.image.model.h5,model.v3.0.h5) in accordance to your README guidance, unfortunately,a scrutable problem happened after I added datamodes to the same file where the rest of programme is stored .I will share the message that indicates error from python console.
Traceback (most recent call last): File "C:\Users\XM8\Desktop\easy12306-master\easy12306-master\main.py", line 60, in <module> main(sys.argv[1]) IndexError: list index out of range
求更新
你在线接口简直就是王者,我青铜都不算,我用的是12306.image.model.h5 最新的,和model.v2.0h5
建议弄个使用教程。新手在运行的过程中,会出现很多莫名其妙的环境问题。
验证有多个标签的时候,是不是不支持啊
请问您本项目的tensorflow版本是多少呀?
巨大而有趣的挑战再次浮出水面
E:\ProgramData\Anaconda3\lib\site-packages\keras\engine\saving.py:292: UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
运行mlearn.py的时候,出现这个错误,请问怎么解决,谢谢
把image.npz的数据代入到mlearn_for_image.py的人工提供的验证集部分,
在new_test_x[idx] = cv2.resize(test_x[idx], (67, 67))会报错
File "mlearn_for_image.py", line 46, in load_data
new_test_x[idx] = cv2.resize(t, (67, 67))
TypeError: src is not a numpy array, neither a scalar
result_fn = sys.argv[1]
classify_fn = sys.argv[2]
请问这两个参数是什么呀!
http://shell.teachx.cn:12306/ 返回结果很快,你这服务器是啥配置的?
为啥我把main.py改造成了 api,结果至少需要2s才能返回。。
Big-brother,can you tell me?please,thanks.
I want to learn
把这里的数据集用于测试,得出的结果:
统计学专家识别的正确率:0.9422140966882884
从统计学专家那里学来的深度学习模型的正确率:0.9811081335640064
统计学对剪纸的识别正确率只有64%,我猜是因为剪纸的种类太多啦。
而深度学习模型识别率最低的是挂钟:
1577/1577 [==============================] - 42s 26ms/step
[0.24407617484627453, 0.9302473050095117]
我猜是因为挂钟和钟表实在是难以区分。
关于钟表的识别力度:
1608/1608 [==============================] - 44s 27ms/step
[0.22922349847223036, 0.9359452736318408]
深度学习对跑步机的识别最有信心:
1564/1564 [==============================] - 43s 27ms/step
[0.0026093199646667294, 1.0]
可以以此证明学习后的神经网络具备识别前所未见的实力吗?
可以说仅1万张图片就够学习了吗?
能不能给机器更少的教材就让它学到有用的技能呢?
实际上它对于验证码的识别力度还可以,但对于真实世界照片的识别力度就没这么高了。
拜读了您的wiki,如果我没有理解错的话texts.npy中包含了百度识别不正确的验证码,也就是说这部分数据没有标注好的标准答案。
您在wiki中提到了,模型在这部分数据中取得了满意的结果。
我在复现您的代码,在这里有个疑问要请教下。您是如何在这部分没有标注答案的数据上评价模型的,是人工查看一些预测结果然后据此估计模型准确性?还是说有类似于训练时accuracy这样的可量化指标?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.