Giter VIP home page Giter VIP logo

captcha_spider's Introduction

简易验证码爬虫框架

一般的验证码获取流程分为三大步骤

  1. 前置请求, 获取验证码相关参数
  2. 验证码请求, 获取验证码
  3. 校验请求, 通过官网判定验证码是否正确

通过继承 Project 类实现具体的流程 utils.Project

  1. def before_process()-> dict, 返回其他流程需要的参数字典, 通过 self.before_params 访问
  2. def captcha_process() -> bytes, 返回验证码图片bytes
  3. def feedback_process() -> bool, 返回验证码反馈情况,是否正确

若非常信任验证码识别效果不准备做验证步骤可以按如下方式实现:

from utils import Project, ServiceType, Charset
project = Project(
    captcha_length=4,
    captcha_charset=Charset.ALPHABET,
    service_type=ServiceType.Kerlomz,
    captcha_url="https://en.exmail.qq.com/cgi-bin/getverifyimage"
)
project.start(1000)

在 const.json 文件中补充自己的 联众账号百度API 以及样本保存的路径

{
  "baidu":  {
    "app_id":  "app_id",
    "api_key": "api_key",
    "secret_key": "secret_key"
  },
  "lianzhong": {
    "username": "username",
    "password": "password"
  },
  "target_dir": "D:/Samples"
}

编写流程:

  1. 补充const.json
  2. 在spiders包下面新建自己的爬虫可以参考demo.py
  3. 在app.py中执行

该框架会执行整个爬虫及校验流程,对接联众平台如果识别错误会自动调用错误上报接口返还点数,框架为了方便开发学习使用,请勿用于非法途径。demo.py 例子不针对任何网站。

captcha_spider's People

Contributors

kerlomz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.