Giter VIP home page Giter VIP logo

tianyancha's Introduction

tianyancha

天眼查爬取企业信息-企业信用信息查询系统-天眼查爬虫


运行main_all.py即可爬取90%的天眼查公司

运行main_top100.py只爬取96个行业的前100家公司

运行main_search.py根据你的公司名搜索进行爬取指定公司

注意:

1,代理IP请自费或自力更生建免费IP池

2,本项目自带云数据库,可直接使用,用你的电脑直接加入爬虫计划,我愿称之为 “分布式”爬虫

3,如果想用自己的数据库,配置信息在 config.py里 修改为你的数据库和蘑菇代理appkey即可

4,此程序随时可停,随时可继续从上次停止的地方开始,不会重复爬取

5, 如果有其他问题,或者想要数据,加Q群: 231436610

步骤1: 查看data文件夹,看数据是否满足你的数据分析要求。

  • 公司信息.xlsx 超过2万条的企业信息
  • 行业TOP100.sql 各个行业Top100的企业信息 大概9000条记录。

步骤2: 如果数据不满足你的要求,你想操练一下,花点钱爽爽。

  1. 安装好mysql,建好表,sql文件夹下有建表脚本。

  2. 买好蘑菇代理或者其他代理池的代理

  3. 修改config.py 里面的配置,与数据库配置以及蘑菇代理API 的appkey,还有爬虫容错重试次数

  4. 根据你的要求运行main_all.py和main_top100.py

  5. 跑test.main 之前需要跑一次 find_industry.py这个脚本:把96个行业*5页的行业记录初始化好。

  6. 如果有错误,请一步步调试,get_html.py 和find_info.py底下有注释掉的调试代码,按需修改运行

  7. 能正常跑的话,请你动手点个star哈,祝你爬虫监狱之旅快乐 (๑•̀ㅂ•́)و✧

tianyancha's People

Contributors

yzlit avatar yezl77 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.